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Abstract 

This  paper  proposes  a  framework  of  denotations!  semantics  of  database  type  systems  and  constructs 
a  type  system  for  complex  database  objects.  Starting  with  an  abstract  analysis  of  the  relational  model, 
we  develop  a  mathematical  theory  for  the  structures  of  domains  of  database  objects.  Based  on  this 
framework,  we  construct  a  concrete  database  type  system  and  its  semantic  domain.  The  type  system 
allows  arbitrarily  complex  structures  that  can  be  constructed  using  labeled  records,  labeled  variants, 
finite  sets  and  recursion.  On  the  semantic  domain,  in  addition  to  standard  operations  on  records, 
variants  and  sets,  a  join  and  a  projection  are  available  as  polymorphically  typed  computable  functions 
on  arbitrarily  complex  objects.  We  then  show  that  both  the  type  system  and  the  semantic  domain 
can  be  uniformly  integrated  in  an  ML-like  programming  language.  This  leads  us  to  develop  a  database 
programming  language  that  supports  rich  data  structures  and  powerful  operations  for  databases  while 
enjoying  desirable  features  of  modern  type  systems  of  programming  languages  including  strong  static 
type-checking,  static  type  inference  and  ML  polymorphism. 


1  Introduction 

There  have  been  a  number  of  attempts  to  develop  data  models  *<-  esent  complex  database  objects  beyond 
the  first-normal-form  relational  model.  Examples  include  nested  lelations  [22,  48,  46]  and  complex  object 
models  [31,  8,  2],  (See  also  [29]  for  a  survey.)  However,  these  complex  data  structures  and  associated  database 
operations  have  not  been  well  integrated  in  a  modern  type  system  of  a  programming  language,  creating  the 
problem  known  as  “impedance  mismatch”  [39,  7].  As  a  result,  database  programming  cannot  share  the 
benefits  of  recent  developments  in  type  theories  of  programming  languages  such  as  static  type  inference 
[40,  20]  and  polymorphism  [40,  47],  which  should  have  had  apparent  practical  benefits  for  many  database 
applications.  The  problem  is  seen  by  simply  noting  that  any  existing  polymorphic  type  system  cannot 
represent  even  the  relational  model  -  perhaps  the  simplest  form  of  a  “complex  object”  model.  As  pointed 
out  in  [6],  no  existing  type  system  can  type-check  a  polymorphic  natural  join  operation.  Several  languages 

"This  research  was  supported  in  part  by  grants  NSF  IRI86-10617,  ARO  DA A6- 29-84- k-0061 ,  and  by  funding  from  AT&T's 
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have  been  proposed  to  integrate  database  structures  into  a  programming  language  [52,  4,  5,  17,  16,  42]. 
(See  also  [6]  for  a  survey.)  However,  their  type  systems  are  either  dynamic  or  rather  limited  and  do  not 
incorporate  static  type  inference  nor  polymorphism. 

The  author  believes  that  the  major  source  of  this  mismatch  problem  is  the  poor  understanding  of  the 
properties  of  types  for  databases  and  the  structures  of  domains  of  database  objects.  Traditionally,  the  theory 
of  types  of  programming  languages  has  been  focussed  on  function  types  and  domains  of  functions.  Neither 
the  properties  of  database  type  systems  nor  their  relationship  to  type  systems  of  programming  languages 
have  been  well  investigated.  The  goal  of  this  paper  is  to  construct  a  theory  of  database  type  systems  that 
will  serve  as  a  “bridge"  between  complex  data  models  and  type  systems  of  programming  languages  and  to 
propose  a  concrete  database  type  system  that  is  rich  enough  to  represent  a  wide  range  of  complex  database 
objects.  These  she  'Id  enable  us  to  develop  a  strongly  typed  database  programming  language  that  supports 
rich  data  structures  and  powerful  operations  for  databases  while  enjoying  desirable  features  of  modern  type 
systems  of  programming  languages  including  static  type  inference  and  ML  polymorphism. 

As  suggested  by  Cardelli  [14],  one  way  to  represent  complex  objects  in  a  programming  language  is  to  use 
labeled  records  and  labeled  disjoint  unions  (or  labeled  variants)  found  in  many  programming  languages  such 
as  Pascal,  Standard  ML  [25],  Amber  [15]  and  Galileo  [4],  The  following  is  an  example  of  a  labeled  record 
expression: 

[Name  =  [Ftrstname  =  "  Joe" ,  Lasiname  =  "Doe"],  Dept  =  "Sales",  Office  =  278] 

Types  for  expressions  can  be  easily  defined.  For  example,  the  above  record  is  given  the  following  type: 

[Name  :  [Firsfname  :  string,  Lasiname  :  string],  Dept  :  string,  Office  :  int] 

Tuples  in  the  relational  model  are  regarded  as  labeled  records  that  contain  only  atomic  values.  In  program¬ 
ming  languages,  these  data  structures  are  inductively  defined  allowing  arbitrarily  nested  structures.  Some 
languages  also  support  recursively  defined  types  and  expressions.  On  these  complex  expressions,  various 
operations  are  available.  Assuming  computable  equality  on  each  atomic  type,  equality  on  expressions  that 
do  not  contain  functions  is  computable  and  it  is  not  hard  to  introduce  set  expressions  on  those  complex 
expressions.  A  database  of  complex  objects  could  then  be  represented  as  a  set  of  these  complex  expressions. 

An  obvious  problem  of  this  approach  is  that,  in  practice,  both  expressions  and  sets  become  very  large 
and  contain  a  great  deal  of  redundancy.  This  problem  is  elegantly  solved  in  the  relational  model  by  the 
introduction  of  the  two  operations  the  ( natural )  join  and  the  projection.  Instead  of  representing  a  database 
as  one  large  set  (relation)  of  large  tuples,  we  can  first  project  it  onto  various  small  relations  and  then 
represent  a  database  as  a  collections  of  those  small  relations.  Larger  relations  are  obtained  by  joining  these 
small  relations  when  needed.  In  order  to  support  complex  database  objects  in  a  programming  language, 
it  is  therefore  essential  to  support  a  join  and  a  projection  on  complex  expressions.  We  further  believe 
that  properly  generalized  join  and  projection  together  with  standard  operations  on  complex  expressions 
form  a  sufficiently  rich  set  of  operations  for  complex  database  objects.  Furthermore,  integration  of  them 
into  a  modern  type  system  of  a  programming  language  yields  a  database  programming  language  in  which 
databases  are  directly  representable  as  typed  data  structures  and  a  powerful  set  of  operations  are  available  as 
typed  polymorphic  functions.  Such  a  programming  language  should  be  also  suitable  for  other  data  intensive 
applications  such  as  natural  language  processing  and  knowledge  representation.  We  therefore  hope  that  the 
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integration  should  also  contribute  to  solve  the  “high-level”  impedance  mismatch  between  database  systems 
and  other  applications. 

The  join  and  the  projection  in  the  relational  model  are  based  on  the  underlying  operations  that  compute 
a  join  of  tuples  and  a  projection  of  a  tuple.  By  regarding  tuples  as  partial  descriptions  of  real-world  entities, 
we  can  characterize  these  operations  as  special  cases  of  very  general  operations  on  partial  descriptions;  the 
one  that  combines  two  consistent  descriptions  and  the  one  that  throws  away  part  of  a  given  description.  For 
example,  if  we  consider  the  following  non-flat  tuples 

=  [Name  —  [Firstname  =  "Joe"]] 

and 

t2  =  [Name  =  [Lastname  =  "Doe"]] 
as  partial  descriptions,  then  the  combination  of  the  two  should  be 

t  =  [Name  —  [Firstname  =  "Joe",  Lastname  =  "Doe"]] 

Conversely,  the  tuple  1 1  is  considered  as  the  result  of  the  projection  of  the  partial  description  t  on  the 
structure  specified  by  the  type 

[Name  :  string,  [Firstname  :  string]]. 

Operations  that  combine  partial  information  also  arise  in  other  areas  of  applications.  Examples  include  the 
“meet  operation”  on  Ait-Kaci’s  t/>-terms  [3]  and  the  “unification  operation”  on  feature  structures  representing 
linguistic  information  (see  [55]  for  a  survey) . 

Based  on  this  general  intuition,  in  this  paper,  we  propose  a  framework  of  denotational  semantics  for 
database  type  systems  and  construct  a  concrete  database  type  system  and  its  semantic  domain.  The  type 
system  contains  arbitrarily  complex  expressions  definable  by  labeled  records,  labeled  variants,  finite  sets  and 
recursion.  On  its  semantic  domain,  a  join  and  a  projection  are  defined  as  polymorphically  typed  computable 
functions.  Furthermore,  we  carry  out  these  construction  in  a  completely  effective  way.  In  our  framework, 
we  require  types  and  objects  to  be  finitely  representable  and  various  properties  to  be  effectively  computable. 
This  means  that,  once  we  have  constructed  the  type  system  and  its  semantic  domain  based  on  our  framework, 
it  not  only  provides  an  uniform  and  elegant  explanation  of  the  properties  of  type  system  and  the  structures  of 
domain  of  complex  database  objects,  but  it  also  provides  representations  and  algorithms  to  integrate  them 
into  a  practical  programming  language.  Based  on  these  results,  an  experimental  programming  language, 
Machiavelli  [45],  has  been  developed  at  University  of  Pennsylvania. 

The  rest  of  this  paper  is  organized  as  follows.  In  section  2,  we  analyze  the  relational  model  as  a  typed 
data  structure  and  extract  the  essence  of  the  join  and  the  projection.  This  analysis  will  also  serve  as 
an  introduction  to  the  subsequent  abstract  characterizations  of  database  type  systems  and  their  semantic 
domains.  Based  on  the  analysis  of  the  relational  model,  in  section  3,  we  characterize  the  structures  of  type 
systems  in  which  a  polymorphic  join  and  a  polymorphic  projection  are  definable  and  propose  a  framework 
for  their  semantic  domains.  In  section  4,  we  define  a  concrete  type  system  for  complex  database  objects  and 
construct  its  semantic  domain.  A  part  of  the  construction  of  the  semantic  domain  (section  4.5)  is  based  on 
the  idea  developed  in  [13]  that  a  certain  ordering  on  powerdomains  can  be  used  to  generalize  the  relational 
join  uniformly  to  complex  objects  and  the  idea  due  to  Ai't-Kaci  [3]  that  a  rich  yet  computationally  feasible 
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domain  of  values  is  nicely  represented  by  labeled  regular  trees.  In  revising  this  paper,  the  author  also  noticed 
that  Rounds’  recent  work  [49]  achieves  results  similar  to  the  ones  presented  in  section  4.5  using  a  slightly 
different  framework.  Finally  in  section  5,  we  show  that  the  type  system  and  its  semantic  domain  can  be 
integrated  in  an  ML-like  programming  language. 


2  Analysis  of  the  Relational  Model 

We  first  give  a  standard  definition  of  the  relational  model.  Since  our  purpose  is  to  extract  the  essence  of  the 
type  structure  of  the  model,  we  define  the  model  as  a  typed  data  structure.  We  also  integrate  null  values 
in  the  model.  The  importance  of  null  values  has  been  widely  recognized  and  several  approaches  have  been 
proposed  [9,  53,  34,  58].  Among  them,  we  adopt  the  approach  that  null  values  represent  non-informative 
values  [58].  This  approach  fits  well  in  our  paradigm  that  database  objects  are  partial  descriptions  and  plays  a 
crucial  role  in  our  theory  of  semantic  domains  of  database  type  systems  being  developed  in  the  next  section. 

Let  £  be  a  countably  infinite  set  of  labels.  We  assume  that  we  are  given  a  set  B  of  base  types  and  a  set 
of  atomic  objects  Bb  for  each  b  G  B.  For  each  base  type  6,  we  denote  by  null »  the  null  value  of  the  type  6. 

Definition  1  (Tuples  and  Relations)  A  tuple  type  r  is  a  term  of  the  form  [/j  :  6j,...,/n  :  6„]  where 

lx ,...,/„  G  C  and  bx  ...  ,b„  G  B.  A  tuple  t  of  the  tuple  type  [/i  •  bx . /„  :  6„]  is  a  term  of  the  form 

[/i  =  ci, . . .  ,/„  =  c„]  sucA  that  c,  G  Bbi  or  a  =  nulls,,  1  <  »  <  n.  A  relation  type  (or  relation  scheme  in  the 
database  literature)  p  is  a  term  of  the  form  fr}  for  some  tuple  type  r.  A  relation  instance  r  of  the  relation 
type  f  r}  is  a  term  of  the  form  f^,  such  that  each  t,-,  1  <  *  <  n  is  a  tuple  of  the  type  r. 

Regarding  a  tuple  t  as  a  function  from  a  finite  subset  L  C  £  to  U»es  #4  U  {nullb\b  G  B},  we  write  dom(t) 
for  the  set  of  labels  in  t  and  t(l)  for  the  value  corresponding  to  the  label  I. 

Relation  instances  are  terms  representing  sets,  for  which  the  following  equations  hold: 

ff . . fn}  =  fff,,  •  •  .,<«•.}  if  «i,  -  •  •  ,»n  is  a  permutation  of  1 . n 

and 

ff1.f2.f3.-  -}  =  "fff 2 > f 3 >  • ■  •}  if  f 1  =  <2- 

We  consider  relation  instances  as  equivalence  classes  of  the  above  equality.  Under  this  equality,  relation 
instances  behave  exactly  like  sets  of  tuples,  on  which  ordinary  set-theoretic  operations  are  defined.  Based 
on  this  fact,  we  treat  relation  instances  as  sets  of  tuples  and  apply  ordinary  set-theoretic  notions  directly 
to  them.  Readers  might  think  that  this  strictly  syntactic  treatment  only  introduces  (trivial  but  annoying) 
complication  to  structures  that  were  simpler  and  more  intuitive  if  we  treated  them  just  as  sets.  This  had 
been  true  if  we  were  only  interested  in  sets  of  flat  tuples.  However,  it  is  no  longer  possible  to  maintain  such 
intuitive  treatment  when  we  allow  infintte  structures  through  recursion.  Our  syntactic  treatment  provides  a 
uniform  way  to  treat  complex  structures  involving  recursion. 

Among  the  operations  in  the  relational  algebra,  we  only  define  the  join  and  the  projection.  As  we  have 
argued,  these  two  operations  make  the  model  a  successful  data  model  for  databases.  They  also  distinguish 
the  model  from  standard  type  systems  of  programming  languages.  Two  tuple  types  ri,r2  are  consistent  if 
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Figure  1:  Join  of  Relations  Containing  Null  Values 


for  all  /  €  dom(ri)  n  dom(r2),  ri(l)  =  n(l).  Let  rt ,  r2  be  two  consistent  tuple  types.  Define  jointype(n ,Vi) 
as  the  type  r  such  that  dom(T)  =  dom{r\ )  U  dom(r2)  and  r(/)  =  rx(/)  if  /  G  dom(ri)  otherwise  r(/)  =  r2(/). 
The  two  tuples  t\,t2  are  consistent  if  for  all  /  €  dom(t\)C\dom(t2)  one  of  the  following  hold:  (1)  <x(/)  =  t2(l), 
(2)  ti(l)  =  null),  and  t2(l)  G  Bt  or  (3)  <x(l)  €  and  f2(/)  =  nullb.  Two  relation  type  in}, -|[r2}  are 
consistent  if  ri,r2  are  consistent.  For  two  consistent  relation  types  fn  J,  f  f2]f ,  define  jointype({riJ,  f r2}) 
as  the  relation  type  %jointype(n,  r2)J. 

Definition  2  (Relational  Join)  Let  tx,t2  be  two  consistent  tuples  of  the  respective  types  rl,r2.  Then 
T\,t2  are  also  consistent.  The  join  of  t\,t2,  join{t\,t2),  is  the  tuple  t  of  the  type  joiniype(Ti,T2)  such  that 
dom(t)  =  <fom(<x)Udom(f2),  and  t(l)  =  ti(l)  if  I  $  dom(t  i)  and  either  l  £  dom{t2)  ort2(l)  =  null b  otherwise 
t(l)  =  <2(/). 

Let  ri  =  ftj, . . .  ,t„  J,  r2  =  . . .  ,t'm  J  be  two  relation  instances  having  the  consistent  relation  types 

Pi,  P‘2  respectively.  The  ( natural)  join  ofr\,r2,  join(ri,r2),  is  the  relation  instance  r  of  the  type  jomtype(pi,  p2) 
such  that  r  =  {t\3ti  G  r\3tj  €  r2.ti,tj  are  consistent  ,t  =  join(ti,tj)} . 

Definition  3  (Relational  Projection)  Let  t  =  [/i  =  ci  =  c„, . . .]  be  a  tuple  of  a  type  r  of  the  form 

[/x  :  b\, . . .  ,ln  :  bn, . . .].  The  projection  of  t  onto  the  type  r1  —  [/x  :  6x, . . . ,  /„  :  bn],  projectT,(t),  is  the  tuple 
[/x  =  ex, . . .  ,/n  =  c„]  of  type  r1 .  Let  r  is  a  relation  instance  of  the  type  f  rj.  The  projection  of  r  on  the  type 
{r'J,  pro)ec<|T,j(r),  is  the  relation  instance  $projectT,(t)\t  €  rj  of  the  type  fr'}- 

When  restricted  to  tuples  without  null  values,  it  is  clear  that  the  above  definitions  are  straightforward 
translations  of  standard  definitions  of  the  relational  model  found  for  example  in  [57,  21,  38].  The  operation 
join  is  extended  to  relations  containing  null  values.  Figure  1  shows  an  example  of  a  join  of  relations  containing 
null  values.  Note  that  the  definition  of  the  join  reflects  the  intended  semantics  of  null  values.  The  projection 
is  specified  by  a  type  not  just  a  set  of  labels.  This  will  allow  us  to  generalize  the  relational  projection  to 
complex  structures. 

These  definitions  apparently  depend  on  the  underlying  structures  of  flat  tuples.  There  are  some  efforts 
to  generalize  these  operations  beyond  the  first  normal  form  relations  [48,  1,  22,  32].  (See  also  [29]  for  a 
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survey.)  However,  their  definitions  still  depend  on  the  underlying  tuple  structures.  Here,  we  would  like  <x> 
characterize  the  join  and  the  projection  operations  independent  of  the  underlying  data  structures  so  th  it 
we  can  generalize  them  uniformly  to  a  wide  range  of  complex  data  structures  and  introduce  them  to  a  type 
system  of  a  programming  language.  Our  guiding  intuition  is  the  idea  exploited  in  [13]  that  database  objects 
are  partial  descriptions  of  real-world  entities  and  are  ordered  in  terms  of  their  “goodness”  of  descriptions. 
The  idea  of  partial  description  was  originally  suggested  by  Lipski  [35].  The  corresponding  ordered  structure 
was  first  observed  by  Zaniolo  [58]  and  is  closely  related  to  the  ordering  on  0- ter  ms  [3]  and  finite  state 
automata  [50]. 

A  preorder  is  a  transitive  reflexive  relation.  Let  ( P,< )  be  a  preordered  set.  Two  elements  x,y  6  P  is 
consistent  if  there  is  some  z  €  P  such  that  x  <  z  and  y  <  z.  z  is  called  an  upper  bound  of  x,  y.  (In  what 
follows,  we  only  need  upper  bounds  of  two  elements  and  therefore  we  restrict  the  notion  of  upper  bounds 
to  upper  bounds  of  two  elements.)  A  least  upper  bound  of  x,  y  is  an  upper  bound  z  of  x,  y  such  that  z  <  w 
for  any  upper  bound  w  of  x,y.  A  preordered  set  (P,  <)  has  the  pairwise  bounded  join  property  if  any  two 
consistent  elements  has  a  least  upper  bound.  A  partial  order  is  an  antisymmetric  preorder.  In  a  partially 
ordered  set  (poset),  least  upper  bounds  are  unique.  We  denote  by  x  U  y  the  least  upper  bound  of  x,y  (if 
exists).  Any  preordered  set  (P,  <)  induces  a  poset,  called  the  quotient  poset  induced  by  (P,  <),  denoted  by 
[(P,  <)]:  Let  =  be  the  equivalence  relation  on  P  defined  asx  =  yiffx<y  and  y  <  x.  We  denote  by  [x]  the 
equivalence  class  containing  x.  Define  the  set  P/=  as  {[x]|x  €  P}  and  the  relation  </=  on  P/=  as  [x]  </=  [y] 
iff  x  <  y.  Then  [(P,  <)]  is  the  poset  (P/=,  </=)■  The  following  result  is  standard. 

Lemma  1  If  (P,  <)  is  a  preordered  set  with  the  pairwise  bounded  join  property  then  [(P,  <)]  is  a  poset  with 
the  pairwise  bounded  join  property. 

For  generality  and  simplicity,  we  treat  tuples  and  relations  uniformly.  We  call  both  tuple  types  and 
relation  types  as  flat  description  types  (ranged  over  by  u)  and  tuples  and  relation  instances  as  flat  descriptions 
(ranged  over  by  d).  For  each  flat  description  type  "  we  write  D„  for  the  set  of  descriptions  of  the  type 
<7.  A  flat  description  type  represents  a  structure  of  descriptions.  Such  structures  are  naturally  ordered  to 
represent  the  intuition  that  one  contains  the  other.  For  example,  if  <r\  =  [,Vame  :  string,  Age  :  ini]  and 
<?2  =  [Name  :  siring,  Age  :  int,  Office  :  inf],  then  the  structure  represented  by  <T2  contains  the  structure 
represented  by  a j.  This  intuitive  idea  is  formalized  by  the  following  ordering: 

Definition  4  (Ordering  on  Flat  Description  Types)  The  information  ordering  <  on  flat  description 
types  is  the  smallest  relation  satisfying: 

[/i  :  6i,  6n]  <  [/i  :  6i,  6„, .. .] 

In}  <  »/D  <  r7 

Since  the  relation  is  based  on  the  inclusion  of  fields  of  records,  it  is  clear  that  it  is  a  partial  order.  Moreover, 
this  ordering  has  the  following  properties: 

1.  <  on  the  set  of  description  types  has  the  pairwise  bounded  join  property,  and 

2.  the  ordering  relation  <  is  decidable  and  least  upper  bounds  (if  they  exist)  are  effectively  computable. 
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The  importance  of  this  ordering  is  that  it  provides  the  following  characterization  of  the  types  of  the 
relational  join  and  the  relational  projection: 

Theorem  1  (Types  of  Relational  Join  and  Projection)  Letdi,dj  be  flat  descriptions  of  the  types  ,  cr2 
respectively. 

1.  If  join(d\,di)  is  defined  and  equal  to  d  then  <T\  LI  <r2  exists  and  d  has  the  type  er i  U  <r2. 

2.  If  projecta(d\)  is  defined  and  equal  to  d  then  <r  <  and  d  has  the  type  a. 

Proof  The  property  of  join  is  an  immediate  consequence  of  the  fact  that  jointype(<r\ ,  <r2)  exists  and  equal 
to  cr  iff  <Ti  U  <r2  exists  and  equal  to  <r.  The  property  of  project  is  an  immediate  consequence  of  the  definition. 

I 

We  can  then  give  the  following  type  schemes  (polymorphic  types)  to  the  join  and  the  projection: 

join  (ai  x  a 2)  — »  <j\  U  <r2  for  all  <J\ ,  cr2  such  that  <7!  U  <r2  exists 

project  <7i  — •  <r2  for  all  <T\ ,  <r2  such  that  <r2  <  tri 

Since  the  ordering  relation  is  decidable  and  least  upper  bounds  are  effectively  computable,  these  type  schemes 
allow  us  to  type-check  expressions  containing  joins  and  projections. 

We  next  characterize  these  operations  themselves  using  ordering  on  descriptions.  Zaniolo  observed  [58] 
that  the  introduction  of  null  values  induces  the  following  ordering  on  tuples: 

[/1  =  xi  =  x„]  C  [/t  =  yi , . . . ,  /„  =  yn\  iff  either  x,  =  nullb  or  x,  =  j/j ,  1  <  i  <  n 

This  ordering  is  interpreted  as  the  ordering  of  “goodness”  of  descriptions.  The  following  is  an  example  of 
this  ordering. 

[Name  =  "Joe  Doe ",  Age  =  nullint]  E  [Name  =  "Joe  Doe” ,  Age  =  21] 

It  is  clear  that  for  ant  tuple  type  r  this  ordering  is  a  partial  order  on  DT  with  the  pairwise  bounded  join 
property.  The  join  on  tuples  of  a  same  type  is  characterized  as  the  least  upper  bound  operation  under  this 
ordering,  which  formalizes  our  intuition  that  the  join  is  an  operation  that  combines  partial  descriptions: 

Proposition  1  (Join  of  Flat  Tuples)  //<i,f2  6  DT  then  ;o«n(<i,f2)  =  t  iff  U<2  =  t. 

Proof  By  definitions.  I 

For  a  relation  type  p,  an  appropriate  ordering  on  De  to  characterize  the  join  on  D0  turns  out  to  be  the 
ordering  known  as  Smyth  powerdomain  ordering  [56].  To  define  the  ordering,  we  first  define  a  preorder  <•. 

r<  ifvtj  €  €  {ti . 

The  relation  ■<  is  not  antisymmetric.  However,  we  can  take  the  quotient  poset  induced  by  the  preorder: 
Proposition  2  For  any  relation  type  p ,  [(Dp.  ^)]  is  a  poset  with  the  pairwise  bounded  join  property. 
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Proof  ■<  is  clej’ly  transitive  and  reflexive  and  therefore  is  a  preordered  set.  Let  rj  and  r2  be 

any  elements  *n  D,  under  ■<.  Let  r  =  €  r2.  ti,tj  are  consistent, t  =  join(ti,tj)}.  Since 

<!  utj  =  join(ti,tj),  as  a  special  case  of  the  result  shown  in  [56],  r  is  a  least  upper  bound  of  1*1  and  r2.  Then 
the  proposition  follows  from  lemma  1.  I 

We  regard  a  relation  instance  as  a  representative  of  the  corresponding  equivalence  class  induced  by  the  above 
preorder  and  write  d i  U  d2  for  the  least  upper  bound  of  the  corresponding  equivalence  classes.  We  also  write 
(D„,C)  for  [(f?/>,^)]-  Readers  are  referred  to  [13]  for  the  intuition  and  relevance  of  this  ordering  in  various 
aspects  of  databases.  [12,  49]  also  use  this  ordering  in  a  context  of  partial  information.  For  us,  this  ordering 
provides  the  following  characterization  of  the  join  on  relations  shown  in  [13]: 

Proposition  3  (Join  of  Flat  Relations)  //r1,r2  £  Dfi  then  jotn(ri,  r2)  =  r  iff  rt  U  r2  =  r. 

In  order  to  characterize  joins  of  descriptions  of  different  types  and  projections,  we  interpret  the  partially 
ordered  spaoe  of  flat  description  types  by  coercions  between  domains. 

Definition  5  (Coercions  between  Relational  Domains)  The  set  of  up-coercions  is  the  set  of  mappings 
|<ri  <  <ti)  defined  as 

1.  if  <T\  —  [/j  :  b\, . . . ,  ln  :  6n],  <r2  =  [f  i  ■  b\, ... ,l„  :  6n,  L»+i  '■  bn+i, ... ,  ln+m  '■  bn+m ]  then 

—  a  a  ( [f  1  —  C\  ,...,  In  =C„])  =  [/i  —  C\,  .  .  .  ,ln  =  Cn  >  Lj  +  1  =  nti/Zj  >  +  | ,  .  .  .  ,  ln  +  m  —  nu^6«  +  m]i 

2.  if  cr  1  =  J,  <r2  =  |r2J  and  tx  <  r2  then 

—  <73(r)  =  {dr,—  r,(OI^  €  rJ- 

The  set  of  down- coercions  is  the  set  of  mappings  ^3|(r2  <  cri}  defined  as 

1.  if  <Ti  =  [lt  :  6lt ln  ■  bn.. . .]  and  <r2  =  [/ 1  :  6i - ,/„  :  6n]  then 

»<r3([ll  —  C  i ,  ....  In  =  Cn  »•■■])  —  [1 1  —  C\ .....  ln  =  C^], 

2.  if<Ti  =  {nj.  &2  =  ][r2}  then  r2  <  n  and 

^<T,-.cr,(r)  =  €  rj. 

Intuitively,  an  up-coercion  coerces  a  description  to  a  description  of  a  larger  structure  by  ‘padding”  extra 
part  of  structure  with  null  values.  A  down-coercion  on  the  other  hand  coerces  a  description  to  a  description 
of  a  smaller  structure  by  “throwing  away”  part  of  its  structure.  For  example,  if 

T\  =  [.Vame  :  string,  Age  :  rnf] 

r2  =  [.Vame  :  string,  Office  :  in t] 

7-3  =  [.Vame  :  string,  Age  :  i nt.  Office  :  in r] 

<1  =  [,' Same  =  “Joe”,  Age  =  21] 

t7  —  [Name  =  "Joe" .  Office  =  278] 

<3  =  [Name  =  “Joe" ,  Age  =  21,  Office  =  278] 
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then 


^t,—t,(<i)  =  [Mime  =  "  Joe ",  Age  —  21,  Office  —  nu/4„t] 

<£Ta_Tj(t2)  =  [Mime  =  "  Joe ",  Age  =  nulknt,  Office  -  278] 

V’rj-T.te)  =  h 

^Ts  —  Tiitz)  —  tj 

We  then  have  the  following  equations: 

join(titt3)  =  0t,-ts(<i)  UDrj  <ATa-T,(<2) 

project  Ti(*3)  =  ^Ta_Tl(<3) 

project^  (<3)  =  ^T,_T3(<2) 

This  example  suggests  that  computing  a  join  of  descriptions  of  types  <T\,<j2  corresponds  to  coercing  them 
to  the  type  <ry  U  <r2  followed  by  computing  their  least  upper  bound.  The  projections  correspond  to  down- 
coercions.  Indeed  we  have: 

Theorem  2  (Relational  Join  and  Projection)  Let  d\  and  d2  be  any  flat  descriptions  of  types  cry ,  cr2 
respectively.  join(dy,d2)  exists  and  equal  to  d  iff  <Tt  U  <r2  exists  and  d  =  <t>ai-.9(di)  UD,  4>07-.0(d2)  where 
a  =  <Ti  U  <r2.  projeci,(d\)  exists  and  equal  to  d  iff  <r  <  try  and  d  =  Vv,— <r(di) 

Proof  By  the  definitions  of  <t>  and  join,  for  any  d\  of  type  and  d2  of  type  <r2  such  that  <rx  U  <r2  exists  and 
equal  to  <r,  join(dy,d2)  exists  and  equal  to  d  iff  join(<j>„l^„(di),<f><r,^.<7(d2))  exists  and  equal  to  d.  Then  the 
property  of  join  follows  from  propositions  1  and  3-  The  property  of  projection  is  by  definitions.  I 

The  semantic  space  of  the  relational  model  is  therefore  characterized  by  the  set 

{(Da,C) \<r  is  a  flat  description  type} 
connected  by  the  set  of  pairs  of  up-  and  down-coercions 

—  ITji  »1  )ki  <  f2}- 

associated  with  the  set  of  join  operations  {join^,^,^ ,  \<Ty  U  tr2  exists  and  equal  to  <?}  defined  as 

J0,nOix<»a)— a(dy,d2)  =  4>ai^a(di)  U p,  (d2) 

and  the  set  of  projection  operations  {projectv._aj\<r2  <  <72}  defined  as 

project,  t_,3(d)  = 

The  importance  of  this  characterization  is  that  it  applies  to  any  set  of  domains  on  which  we  can  define 
information  orderings  and  appropriate  sets  of  coercions.  Based  on  this  analysis,  in  the  next  section,  we 
formally  define  the  structures  of  type  systems  for  databases  and  their  semantic  domains. 
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3  Database  Domains 


As  a  generalisation  of  the  set  of  flat  description  types  in  the  relational  model,  we  define  a  set  of  types  for 
databases  as  follows: 

Definition  6  (Database  Type  Systems)  A  database  type  system  is  a  poset  of  types  (T,  <)  sack  that 

1.  it  has  the  pairwise  bounded  join  property,  and 

2.  the  ordering  relation  and  least  upper  bounds  (if  they  exist)  are  effectively  computable. 

We  call  each  element  ofTa  description  type. 

Each  type  represents  a  structure  of  descriptions  and  the  ordering  on  types  represents  the  containment 
ordering  of  the  structures  they  represent.  The  pairwise  bounded  join  condition  is  necessary  for  the  types  of 
joins  to  be  well  defined.  The  decidability  conditions  is  necessary  for  effective  type-checking. 

Each  description  type  should  denote  a  domain  of  descriptions.  As  a  generalization  of  domains  of  flat 
descriptions  in  the  relational  model,  we  require  domains  of  descriptions  to  satisfy  the  following  conditions: 

Definition  7  (Description  Domains)  A  description  domain  is  a  poset  ( D ,  C)  satisfying: 

1.  D  has  the  bottom  element  nullc,  i.e.  for  any  d  £  D,nullo  Q  d, 

2.  D  has  the  pairwise  bounded  join  property, 

3.  the  ordering  relation  C  is  decidable  and  least  upper  bounds  (if  they  exist )  are  effectively  computable. 

Condition  1  allows  us  to  represent  non-informative  value  which  is  essential  for  partial  descriptions.  Condi¬ 
tion  2  states  that  if  we  have  two  consistent  descriptions  then  the  combination  of  the  two  is  also  representable 
as  a  description.  This  is  necessary  for  join  to  be  well  defined.  The  necessity  of  the  condition  3  is  obvious. 

It  should  be  noted  that  description  domains  are  models  of  types  of  database  objects  and  not  models  of 
general  types  in  programming  languages  such  as  function  types.  In  particular,  they  should  not  be  confused 
with  Scott  domains  [54]  which  is  used  to  give  semantics  to  untyped  lambda  calculus  and  programming 
languages  with  recursively  defined  functions  [51].  Both  notions  share  similar  ordered  structure  and  are 
based  on  a  similar  intuition  that  values  are  ordered  in  terms  of  “goodness  of  approximation” .  However,  the 
properties  of  the  two  orderings  are  fundamentally  different.  The  ordering  on  a  description  domain  is  just  a 
computable  predicate.  On  the  other  hand  the  Scott  ordering  regarded  as  a  predicate  on  the  computability 
and  in  principle  not  computable. 

By  abstracting  underlying  tuple  structures  from  the  definition  of  up-coercions  and  down-coercions  be¬ 
tween  relational  domains,  we  interpret  an  ordering  on  description  types  by  a  special  class  of  mappings 
between  description  domains.  A  function  /  :  D\  — »  Di  between  lescription  domains  D\,Di  is  monotone 
iff  for  any  x,  y  £  D\,  x  C  y  implies  f(x)  C  f(y)- 
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Definition  8  (Embeddings  and  Projections)  A  monotone  function  <p  '■  D\  —*  Dj  is  an  embedding  if 
there  exists  a  function  ip  :  Dj  — ►  D\  such  that  (1)  for  any  x  £  D2,  <p(ip(x))  C  x  and  (2)  for  any  x  £  D\, 
ip(4>(x))  =  x.  The  function  ip  is  called  a  projection. 

A  pair  of  embedding  and  projection  is  a  special  case  of  Galois  connections  (or  adjunctions),  for  which  the 
following  result  is  well  known  [23]: 

Lemma  2  Given  an  embedding  <p  :  D\  — *  D2,  the  corresponding  projection  is  uniquely  determined  ly  <p. 

If  0  is  an  embedding,  we  sometimes  denote  by  <pR  the  corresponding  projection. 

If  a  pair  of  description  domains  (Dx,  D2)  has  an  embedding-projection  pair  (<p  :  Dx  — *  D2,  ip  :  D2  -*  Dx) 
then  D-i  contains  an  isomorphic  copy  D[  =  <p{Dx)  of  Dx  and  for  any  element  d  in  D2  there  is  a  unique  maximal 
element  dl  £  D[  such  that  d!  C  d.  We  regard  this  property  as  the  semantics  of  the  ordering  of  description 
types.  <p  maps  an  element  d  £  Dx  to  the  least  element  dl  £  D2  such  that  dl  contains  all  information  in  d. 
ip  maps  an  element  d  £  D2  to  a  unique  maximal  element  dl  £  Dx  that  contains  only  information  in  d  and 
is  regarded  as  a  database  projection  from  D2  to  Dx.  The  set  of  up-coercions  we  have  defined  on  relational 
domains  are  indeed  the  set  of  embeddings  between  relational  domains.  The  corresponding  projections  are 
exactly  down-coercions. 

Our  characterization  of  the  ordering  on  types  can  be  regarded  as  a  refinement  of  one  of  the  characteriza¬ 
tions  of  subtypes  proposed  by  Bruce  and  Wegner  [11],  where  the  notion  of  subtypes  is  characterized  in  three 
ways;  one  of  them  being  that  the  larger  set  contains  an  isomorphic  copy  of  the  smaller.  It  is  also  related 
to  the  notion  of  information  capacity  of  data  structures  studied  in  [30]  where  the  ordering  on  various  data 
structures  was  defined  by  using  mappings  between  the  sets  of  objects. 

Finally  we  define  a  semantic  space  of  a  database  type  system  as  a  space  of  description  domains  partially 
ordered  by  a  set  of  embedding-projection  pairs. 

Definition  9  (Database  Domains)  A  database  domain  is  a  pair  (Dom,  Emb)  of  a  set  of  description  do¬ 
mains  Dom  and  a  set  of  embeddings  Emb  between  Dom  satisfying  the  following  conditions: 

1.  For  any  two  domains  DX,D2  €  Dom,  there  is  at  most  one  <p  G  Emb  such  that  <p  :  D\  — ►  D2.  We  write 
pDi—Dt  for  an  embedding  of  type  D\  — ►  D2. 

2.  For  any  domain  D  £  Dom,  <Pd~D  G  Emb. 

3.  Emb  is  closed  under  composition. 

4-  For  any  two  domains  DX,D2  £  Dom,  if  there  is  some  D  £  Dom  such  that  <Pdi—D  G  Emb  and 
d>D,—  D  G  Emb  then  there  is  a  unique  D'  €  Dom  depending  only  on  D\,D2  such  that  Od,  —  D'  G  Emb. 
<Pd,-D'  G  Emb  and  for  any  D"  £  Dom  \f  <Pdx~D"  G  Emb  and  <Pd7-D"  G  Emb  then  4>D'-d ••  G  Emb. 

5.  For  any  <j>  £  Emb,  both  <p  and  <pR  are  computable,  i.e.  there  ts  an  algorithm  to  compute  4>{d)  and  <pR{d') 
for  any  given  d  £  dom(<p)  and  d!  £  dom(<pR). 

The  condition  1  means  that  the  set  of  embeddings  defines  a  relation  on  Dom.  Moreover, 
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Proposition  4  The  relation  defined  by  Emb  is  a  partial  order  with  the  pairwise  bounded  join  property. 

Proof  From  the  condition  2  and  3,  the  relation  is  reflexive  and  transitive.  For  anti-symmetricity,  suppose 
<t>x—Y  €  Emb  and  4>y—x  €  Emb  for  some  X,Y  €  Dom.  Since  <j>x—x  €  Emb  and  4>y—y  €  Emb,  the 
uniqueness  of  D'  in  the  condition  4  implies  X  =  Y.  The  pairwise  bounded  join  property  is  an  immediate 
consequence  of  the  condition  4.  I 

Definition  10  (Models  of  Database  Type  Systems)  Let  (T,<)  be  a  database  type  system.  A  database 
domain  (Dom,  Emb)  is  a  model  of  (T,  <)  if  there  is  a  mapping  fi  :  T  — ►  Dom  such  that  for  any  ri,  r2  £T, 
n  <  Ti  iff  m(t-j)  £  Emb. 

Remember  that  on  description  domains  we  imposed  the  conditions  that  the  ordering  is  decidable  and  least 
upper  bounds  are  computable.  Combined  with  the  computability  condition  on  embeddings  and  projections, 


they  guarantee  that  the  join  and  the  projection  defined  as 

join^aiX<ti)_,(di,dj)  =  0»a_a(d2)  (1) 

project,  ^^(d)  =  Vv,-<r,(d)  (2) 

are  always  computable  functions.  This  means  that  if  a  database  type  system  has  a  model,  then  the  join  and 
the  projection  are  available  as  computable  functions  with  the  following  polymorphic  types: 

join  :  (<ri  x  <r2)  -*  <*\  U  <r2  for  all  <7i,  <t2  such  that  ox  U  <r2  exists  (3) 

project  :  <rx  —*  <r2  for  all  <ri,<r2  such  that  <r\  <  <r2  (4) 


The  relational  join  and  the  relational  projection  are  special  cases  of  the  above  functions  on  flat  tuple  struc¬ 
tures.  Moreover,  from  the  previous  results,  we  have: 

Theorem  3  The  set  of  flat  description  types  with  the  information  ordering  <  is  a  database  type  system. 
The  pair  of  the  set  of  relational  domains  and  the  set  of  up-coercions 

({(£>„,  C)|<7-  is  a  flat  description  type},  |<r i  <  (T2}) 

is  a  database  domain  and  a  model  of  the  poset  of  flat  description  types. 

We  therefore  claim  that  the  notions  of  database  type  systems  and  database  domains  are  a  proper  general¬ 
ization  of  the  relational  model. 

The  advantage  of  this  characterization  is  that  it  is  independent  of  the  actual  structures  of  types  and 
objects.  This  allows  us  to  generalize  the  relational  model  to  wide  range  of  structures,  even  those  that 
include  recursively  defined  types  and  objects.  In  the  next  section  we  construct  a  database  type  system 
and  its  database  domain,  which  we  believe  is  rich  enough  to  cover  virtually  all  proposed  representations  of 
complex  database  objects. 


4  A  Type  System  for  Complex  Database  Objects 

In  addition  to  finite  structures  representable  by  finite  terms,  we  would  like  to  allow  recursively  defined 
structures,  which  naturally  emerge  in  descriptions  of  real-word  entities.  As  demonstrated  by  Ai't-Kaci  [3], 
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an  appropriate  formalism  to  represent  these  structures  are  regular  trees,  which  provides  a  sufficiently  rich 
yet  computationally  feasible  framework  for  complex  data  structures.  We  therefore  develop  our  type  system 
and  its  domain  using  regular  trees.  However,  this  generality  creates  a  slight  technical  complication  that 
we  cannot  use  inductive  method  to  define  structures  and  to  prove  properties.  This  may  yield  less  intuitive 
definitions  and  might  decrease  the  readability  of  the  rest  of  the  paper.  In  order  to  prevent  the  situation, 
for  major  definitions  and  properties,  we  give  equivalent  inductive  characterizations  on  finite  trees.  They  will 
not  be  used  in  the  subsequent  development  and  we  shall  omit  the  proofs  of  their  equivalence  to  the  original 
definitions  restricted  to  finite  trees.  They  can  be  proved  by  usual  structural  induc'ion. 

4.1  Labeled  Regular  Trees 

We  gather  definitions  and  standard  results  on  regular  trees.  Main  references  on  this  subject  are  [19,  18]. 

Let  A  be  a  set  of  symbols.  The  set  of  all  strings  (finite  sequences  of  symbols)  over  A  is  denoted  by  A" .  The 
length  of  a  string  o  €  A*  is  denoted  by  |a|.  The  empty  string  t  is  the  string  of  length  0.  The  concatenation 
of  a,  b  £  A'  is  denoted  by  a  ■  b.  A  string  a  is  a  prefix  of  a  string  b  if  there  is  some  c  such  that  a  =  6  •  c.  A 
prefix  a  of  6  is  proper  if  a  ^  6.  For  X  C  A*  and  Y  C  A*,  X  ■  Y  is  the  set  {x  ■  y|*  £  X,  y  £  V}.  We  write  x  ■  Y 
for  {ar}  ■  Y  and  X  ■  y  for  X  ■  {y}.  For  a  €  A*  and  X  C  A*,  X/a  is  the  set  {6|3c  £  X  such  that  c  =  a  ■  6}.  We 
identify  an  element  a  €  A  and  the  corresponding  string  a  of  length  one. 

Instead  of  using  a  standard  representation  of  trees  based  on  fixed  arity  function  symbols  with  ordered 
arguments,  we  use  labeled  trees  whose  node  are  labeled  with  function  symbols  and  whose  edges  are  labeled 
with  elements  in  £  indicating  their  arguments.  This  is  a  generalization  of  labeled  record  structures  and  is 
particularly  suitable  for  representing  complex  structures  including  recursively  defined  ones.  The  following 
definition  is  due  to  [3]. 

Definition  11  (Labeled  Trees)  Let  F  be  a  ( not  necessarily  finite)  set  of  symbols.  A  labeled  F-tree  is  a 
function  a  :  L  — *  F  suck  that  L  is  a  prefix-closed  subset  of  £*,  i.e.  for  any  a,b  £  £* ,  if  a-b  £  L  then  a£  L. 
A  tree  a  is  finite  if  its  domain  dom(a)  is  finite  otherwise  it  is  infinite.  The  set  of  all  F-trees  and  the  set  of 
all  finite  F-trees  are  denoted  respectively  by  T°°(F)  and  T(F). 

Note  that  we  do  not  impose  the  arity  restriction  on  function  symbols.  However,  we  can  regard  each  function 
symbol  /  £  F  as  the  set  of  symbols  {/{(,, |/i , . . . ,  l„  €  £}  indexed  by  finite  sets  of  labels.  By  assuming 
a  total  order  <C  on  £,  we  can  then  regard  our  definition  of  trees  as  a  notational  variant  of  the  standard 
representation  of  trees  found  in  [19,  18]  based  on  the  tree  domains  [24].  We  omit  formal  treatment  of  the 
connection. 

For  any  element  /  £  F,  we  also  denote  by  /  the  one  node  tree  such  that  dom(f)  =  { c }  and  /(c)  =  /. 

Let  ai , . . . ,  an  €  T°°(F),  /i  ,...,/„  €  £  and  f  £  F .  We  write  /(/i  =  ct\ . ln  =  on )  to  denote  the  tree 

a  such  that  dom(a)  =  / i  •dom(Q1)u  ■  •  •  U /„  ■  dom(an),  a(e)  =  /,  a(U  a)  =  a, (a)  for  all  a  £  dom{a,), 
1  <  i  <  n.  If  a  €  T^IF)  and  a  £  dom(a)  then  the  subtree  at  a  in  a,  denoted  by  a/a,  is  the  tree  a'  such 
that  dom(a')  =  dom(a)/a,  and  for  all  b  £  dom(a'),  a'(6)  =  a(a  ■  6).  The  set  of  all  subtrees  of  a  tree  a  is  the 
set  Subtrees(a)  =  {a/a|a  £  dom(a)}. 
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Definition  12  (Regular  Tree*)  A  tree  a  £  T°°(F)  it  regular  iff  the  set  Svbtrees(a)  is  finite.  The  set 
of  all  regular  trees  in  T°°(F)  is  denoted  by  R(F). 

Intuitively,  regular  trees  are  trees  that  have  a  finite  representation.  There  are  several  equivalent  representa¬ 
tions  of  regular  trees.  Following  [3],  we  use  Moore  machines  to  represent  them. 

Definition  13  (Moore  Machine)  A  More  machine  is  a  5-tuple  (Q,s,F,6,  A),  where  Q  is  a  set  of  states, 
s  is  a  distinguished  element  in  Q  called  the  start  state,  F  is  the  set  of  output  symbols,  6  is  a  partial  function 
from  Q  x  C  to  Q  called  the  state  transition  function  such  that  for  any  q  £  Q,  {/  6  £|5(g,  l)  is  defined}  is 
finite  and  A  is  the  output  function  from  Q  to  F. 

In  the  above  definition,  the  input  alphabet  is  implicitly  assumed  as  the  fixed  set  C  of  labels.  Because  of  the 
restriction  on  S,  a  Moore  machine  under  the  above  definition  behaves  like  a  Moore  machine  under  a  standard 
definition  where  the  input  alphabet  C  is  finite  and  6  is  defined  as  a  total  function  on  Q  x  C.  As  is  clone 
in  standard  finite  state  automata  [26],  we  extend  6  to  6*  on  Q  x  £*.  A  state  q  £  Q  is  reachable  if  there  is 
some  a  £  C'  such  that  6*{s,a)  =  q.  Each  state  q  £  Q  of  a  Moore  machine  M  =  (Q,s,F,6,  A)  represents  a 
function  form  a  prefix-closed  subset  of  £*  to  F.  Define  M(q)  as  the  function  such  that  dom(M(q))  =  {a  £ 
£|6*(g,a)  =  q  for  some  q'  £  Q}  and  M(q)(a)  =  A (6*(q,a))  for  all  a  G  dom(M). 

The  following  theorem  establishes  the  relationship  between  Moore  machines  and  regular  trees,  which  is 
essentially  same  as  the  equivalence  of  regular  trees  and  regular  systems  shown  in  [19].  The  proof  can  be 
easily  reconstructed  from  the  corresponding  proof. 

Theorem  4  For  any  Moore  machine  M  =  (Q,s,F,6,  A),  M(s)  £  R{F).  Conversely,  for  any  regular  tree 
a  £  R(F)  there  is  a  Moore  machine  M  =  (Q,s,F,6,  A)  such  that  a  =  M(s). 

We  say  that  a  regular  tree  a  is  represented  by  a  Moore  machine  M  if  M(s)  =  a. 

We  use  the  following  term  language  to  represent  regular  trees  via  Moore  machines: 

e  s  \  f  \  f(l  =  e,  ...,/  =  e)  |  (rec  s.  e) 

where  /  stands  for  F,  l  stands  for  C  and  s  stands  for  the  set  of  state  variables  disjoint  from  other  symbols. 
The  state  variables  are  bound  variables  similar  to  those  in  lambda  calculi.  A  term  e  is  proper  if  a  state 
variable  occurrence  s  is  either  an  occurrence  of  the  form  rec  s  or  in  some  e'  in  ( rec  s.  e'). 

For  a  proper  term  e,  define  the  Moore  machine  Me  =  (Q,  s,  F,  6,  A)  as: 

1.  Q  =  {qj |  for  each  occurrence  f  £  F  in  e}, 

2.  s  =  qj  where  /  is  the  outmost  occurrence  of  output  symbol  in  e, 

3-  6(<lf,l)  =  iff  either  /,  g  are  the  occurrences  in  a  subterm  of  the  form  /(...,/  =  s(. . .), . . .)  or  f,g  are 
the  occurrences  in  a  subterm  of  the  form  (rec  s.  g(. /  =  s, ...).. .))  such  that  it  is  the  smallest 
subterm  of  the  form  (rec  s.  •  •  •)  surrounding  /(...,/  =  s, . . .). 

4.  A (?,)  =  /. 
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The  regular  tree  represented  bp  a  proper  term  e  ia  then  defined  aa  M,(s).  It  can  be  alao  shown  that  for  any 
regular  tree  a  there  ia  a  proper  term  e  that  represents  a. 

For  a  technical  convenience  we  assume  that  the  set  of  labels  £  is  closed  under  products,  i.e.  there  is  a 
injective  function  prodcode  :  (£  x  £)  — ►  £.  For  any  given  set  of  labels,  we  can  construct  a  set  satisfying  this 
condition.  We  use  prodcode  implicitly  and  treat  £  as  the  set  satisfying  £  x  £  C  £.  In  particular  (£  x  £)*  6  £*  ■ 
On  (£  x  £)*  we  define  the  mappings  first,  second  inductively  as  follows: 

first(e)  =  £ 

first(a(li,l2))  =  first(a)k 
second(e)  =  c 

second(a  ■  (li,li))  =  second(a)  ■  l2 
On  {(a,6)|a  €  £*,  b  6  £*,  |a|  =  |6|},  we  define  pair  as  follows: 

pair(e,e)  =  c 

pair(a  ■  h,b  ■  l2)  =  pair(a,b)  ■  (lul2) 

For  a  £  £*  x  £*,  the  following  equation  always  holds: 

pair(first(a),  second(a))  =  a 

Let  r  be  a  relation  on  £.  The  extension  of  r  on  £*,  denoted  by  r ,  is  the  relation  defined  as: 

ere 

a  •  h  r  a  ■  l2  it  li  r  l2 

The  following  construction  on  Moore  machines,  which  “traces”  two  Moore  machines  in  “parallel”,  is  often 
useful  to  determine  various  relations  on  regular  trees.  This  can  be  regarded  as  a  generalization  of  the  merged 
transaction  function  used  to  determine  the  equivalence  of  two  finite  state  machines  in  [27].  The  new  symbol 
$  introduced  below  represents  a  “rejecting  state”  in  a  standard  representation. 

Definition  14  (Product  Machine)  Let  ~  be  any  equivalence  relation  on  C.  Given  two  Moore  machines 
Ai)  and  M2  =  {Q2,s2F2,62,\2),  the  product  machine  of  M\  and  Afj  modulo  write 
(Mi  x  M2)/~,  is  the  Moore  machine  (Q,s,F,6,  A)  such  that 

1-  Q  =  (Qi  U  {$})  x  (Q 2  U  {$})  where  $  is  a  new  distinguished  symbol  that  does  not  appear  both  in  Mi 
and  M 2, 

2.  s  =  (s1,s2), 

3.  F  =  (Fi  U  {$})  x  (F2  U  {$}) 

4-  S((x,y),I)  is  defined  and  equal  to  (z',y')  iff  one  of  the  following  holds: 

(a)  l  =  (li,l2),  h  ^  $,/2  /  $,  li  ~  l2,  x  G  Qi,y  €  Q2,  and  x'  =  <5i(x,/i),  j/  =  S2(y,l2), 
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(b)  /  =  (h,h),  I 1  ^  $,/j  =  $,  *  €  Qu  either  there  ia  no  l'  such  that  67(3/, l')  is  defined  and  lx  —  l'  or 
y  =  $,  and  z'  —  6i(xfix),  y*  =  $, 

(c)  l  =  (hfij),  h  =  S,/2  #  $,  y  G  <5a,  either  there  is  no  l1  such  that  Sfixfi')  ia  defined  and  I2  —  V  or 
x  =  $,  and  *'  =  $,  j/  =  A2(y,  ^)- 

5.  A((*i,xj))  =  (oi.oj)  where  o<  =  A,(xj)  if  x,-  G  otherwise  o{  =  $,  i  6  {1,2}. 

//  ~  ia  <Ae  identity  relation  =  on  C  then  we  write  Mi  x  M2  /or  (Mi  x  M2y=. 

The  construction  of  a  product  machine  is  clearly  effective.  The  following  properties  are  also  immediate 
consequences  of  the  definition: 

Lemma  3  Let  Mi  =  (Qi,ai,Fi,$i, Ai),  M2  =  (O2, «2, -*^2, ^2, A2)  and  (Q,s,F,S, A)  =  (Mi  x  M2)/~. 

1.  If6*{s,a)  =  (qi,q2),q\  €  Qi.qs  €  Q2  tAen /irst(a)~seeond(a)  and  A*(si,/irs<(a))  =  ji,  6J(s2, second(a)) 
q2.  Conversely,  if  then  an  a,b  such  that  a~b,  6i(si,a)  =  qx  and  S2(s2,b)  —  q2  then  S*(s,pair{a,b))  = 
(fllift)- 

5.  //6*(s,a)  =  (q,x),q  €  Qi  tAen  AI(si,./irsi(a))  =  q  and /irs<(A((g,r)))  =  Ai(q).  //A*(s,a)  =  {x,q),q  G 
Q2  tAen  A2(s2,  second(a))  =  9  and  seeond(A(x,q))  =  A2(q). 

3.  If  6l(si,  a)  =  q  then  then  is  some  b  such  that  first(b)  =  a  and  6*(s,6)  =  (?,x)  and  Ai  (?)  = 

/irsf(A((<j,x))).  If  62(s2,a)  =  q  then  then  is  some  b  such  that  second(b)  =  a  and  6*(s,6)  =  (x,q) 

and  A 2(q)  =  second(A((q,  x))). 

4.2  Set  of  Description  Types 

Using  regular  trees,  we  now  define  the  set  of  types  of  our  type  system: 

Definition  15  (Set  of  Description  Types)  The  set  of  description  type  constructors  is  the  set  FT  = 
{ Record ,  Variant,  Set)  U  B  A  description  type  is  a  tne  <r  G  R(FT)  satisfying  the  following  conditions: 

1.  if  <r(a)  =  Set  then  {/  €  C\a  I  G  dom(<r)}  =  {e/mi}, 

2.  if  < r(a)  =  6eS,  tAen  tAe  set  {/  G  £|a  •  1  G  dom(o)}  is  empty. 

A  description  type  <r  is  finite  if  it  is  finite  as  a  tne.  The  set  of  all  description  types  and  the  set  of  all  finite 
description  types  an  denoted  by  Dtype™  and  Dtype  nspectively. 

Record,  Variant  and  Set  represent  the  record,  the  variant  and  the  set  type  constructors  respectively.  The 
condition  (1)  restricts  set  types  to  be  “  homogeneous"  sets.  Let  cr  1 , . . .  ,on  G  Dtype™ .  We  use  the  following 
notations: 


[h  ■  <ri,. 

■fin  '■  <7n] 

for 

Recordfii  =  <rx,. . 

.  ,  /n  —  <7n  ) , 

</i  :  <7!,.. 

■tin  ■  &n ) 

for 

Variantfii  =  <rx , . 

•  •  *  )  s 

for 

Set(elm\  =  <r) 
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unit 


=  [] 

point  =  [. X-cord  :  int,  Y-cord  :  in/] 

intlist  =  (rec  u.  { Cons :  [Head  :  int,  Tail :  u],  Nil :  unit}) 

object  =  [Name  :  string,  Age  :  in/] 

person  =  (rec  p.  [Mime  :  string,  Age  :  int,  Parents  :  }p}]) 

employee  =  (rec  e.  [Name  :  string,  Age  :  int,  Parents  :  {person  },  Salary  :  int,  Boss  :  e]) 

student  =  [A'ame  :  string,  Age  :  int,  Parents  :  f person},  Course  :  f  s/rtnjf}] 

working-student  =  [Name  :  string,  Age  :  int,  Parents  :  { person },  Course  :  fs/nn^},  Salary  :  int, 

Boss :  employee] 

flights  =  {[Flight :  [F-id  :  in/,  Date  :  s/nnj],  Plane  :  s/rin^]} 

flown-by  =  {[P/ane  :  string,  Pilots  :  {[Name  :  string,  Emp-id :  in/]}]} 

schedule-data  =  {[Flight :  [F-id  :  int,  Date  :  s/rinj],  Plane  :  string, 

Pilots  :  [[[Mime  :  siring,  Emp-id :  in/]}]} 

Figure  2:  Examples  of  Description  Types 

Similar  shorthands  are  adopted  in  term  representations  of  regular  trees. 

The  set  of  finite  description  types  Dtype  coincides  with  the  following  inductively  defined  set  Dtype0 : 

1.  b  €  Dtype0  for  any  b  €  B, 

2.  if  <ri, .. .  ,<r„  G  Dtype0  and  li,...,l„  6  C  then  [/i  :  :  <r„]  €  Dtype0, 

3.  if  <ri,...,<r„  €  Dtype °  and  /i . /„  €  C  then  (h  :  <rlt . . .  ,l„  :  <rn)  €  Dtype0, 

4.  if  a  6  Dtype0,  then  ([<r}  €  Dtype0. 

Figure  2  shows  examples  of  description  types  in  term  representation.  In  this  example,  as  well  as  in  all 
other  examples  we  will  show  later,  identifiers  such  as  unit  are  used  purely  as  syntactic  shorthands  to  avoid 
repetitions  and  have  no  significance  themselves.  As  seen  in  these  examples,  infinite  trees  correspond  to 
recursively  defined  types. 

For  the  set  Dtype00 ,  we  define  the  following  ordering  to  capture  the  ordering  of  the  containment  of  the 
structures: 

Definition  16  (Information  Ordering  on  Dtype00)  Let  G  Dtype00 .  The  information  ordering 

<  on  Dtype 00  is  the  relation  defined  as:  <Ti  <  <r2  iff  dom(<r1)  C  dom((r2)  and  for  any  a  G  dom(<r i), 
<T\(a)  =  <r2(a)  and  if  <T\{a)  =  Vannat  then  {/  G  C\a  •  /  G  dom((Tl)}  =  {/  G  C\a  ■  I  G  dom(<T2)}. 

This  ordering  can  be  regarded  as  a  special  case  of  the  subsumption  ordering  on  AVt-Kaci’s  if -terms  [3].  The 
condition  on  variant  nodes  means  that  in  order  for  two  variant  types  to  be  ordered,  they  must  have  the  same 
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unit  <  point 
unit  <  object 
object  <  person 
person  <  employee 
person  <  student 
employee  <  working-student 
student  <  working-student 
flights  <  schedule-data 
floum-by  <  schedule-data 

Figure  3:  Examples  of  Ordering  on  Description  Types 

set  of  variants.  The  intuition  behind  this  condition  is  that  if  a  variant  type  a  has  a  a  component  l :  tr"  and 
oJ  has  no  /-component,  then  for  a  value  v  of  the  type  a  corresponding  to  the  component  / :  <j"  there  is  no 
value  v'  of  the  type  o'  that  is  related  in  structure  to  v  and  therefore  tr  and  </  Me  not  related. 

The  ordering  <,  when  restricted  to  the  set  of  finite  description  types  Dtype,  coincides  with  the  following 
inductively  defined  relation  <°: 

b  <a  b  for  all  b  e  B 

[/i  :  (Ti,...,/„  :  <r„]  <°  [/i  :  <r\ . /„  :  <r', . . .]  if  <r(  <°  <r-,  1  <  i  <  n 

.  <°  if  <r  <°  a' 

</i  :  <ri,...,/„  :  <r„)  <°  (/x  :  ff[, . . .  ,ln  :  tr'n)  if  <r,-  <°  <r-,  1  <  i  <  n 

Figure  3  shows  examples  of  the  information  ordering  on  Dtype00  among  the  description  types  defined 
in  figure  2. 

From  the  inductive  characterization  of  <,  it  is  easy  to  check  that  (Dtype,  <)  is  a  poset  with  pairwise 
bounded  join  property,  <  is  decidable  and  least  upper  bounds  (if  they  exist)  are  effectively  computable.  The 
following  two  propositions  show  that  these  properties  still  hold  for  general  description  types,  whose  proofs 
can  be  reconstructed  from  the  proofs  of  the  corresponding  properties  on  V'-terms  [3]  by  checking  the  extra 
condition  we  imposed  on  the  variant  nodes. 

Proposition  5  (Dtype00  ,<)  is  a  poset  with  the  pairwise  bounded  join  property. 

Proposition  6  The  ordering  <  on  Dtype00  is  decidable  and  for  any  description  types  <xj ,  cr2 ,  it  is  decidable 
whether  <T\,<r?  are  consistent  or  not  and  if  consistent  then  their  least  upper  bound  is  effectively  computable. 

Combining  proposition  5  and  6,  we  have: 

Theorem  5  (Dtype00 ,  <)  is  a  database  type  system. 
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The  following  u  an  example  of  a  least  upper  bound  of  description  types  defined  in  figure  2: 

employee  U  student  =  working- student 
flights  U  flown-hy  =  schedule-data 

From  examples  shown  in  figure  3  and  the  above  examples,  we  can  see  that  <  is  a  generalization  of  the 
information  ordering  on  types  in  the  relational  models  to  complex  structures  including  recursive  structures 
represented  by  infinite  trees. 

4.3  Universe  of  Descriptions 

In  order  to  construct  a  model  of  (Dtype°° ,  <),  we  first  define  a  set  of  possible  descriptions. 

Definition  17  (Universe  of  Descriptions)  The  set  of  description  constructors  is  the  set  Fs  =  {Record, 
Inj,  Set}  U  ((Jigs  £»)  U  {nulli\b  £  B}.  A  description  is  a  tree  d  £  R{Ft)  satisfying  the  following  conditions: 
for  all  a  £  dom(d), 

1.  ifd{a)  —  Set  then  {1  £  £|a  •  l  £  dom(d)}  =  {elmi, . . . ,  e/m*}  for  some  n  >  0, 

2.  ifd(a)  =  Inj  then  the  set  {/  £  C\a  ■  l  €  dom(d)}  is  either  a  singleton  set  or  the  empty  set, 

3.  if  d(a)  £  Bt,  or  d(a)  =  null j,  then  the  set  {/  €  £|a  ■  l  £  dom(d)}  is  the  empty  set. 

A  description  d  is  finite  if  it  is  finite  as  a  tree.  The  set  of  all  descriptions  and  the  set  of  all  finite  descriptions 
are  denoted  by  Dobj°°  and  Dobj  respectively. 

Inj  is  a  variant  constructor  (injection  to  a  variant  type).  Inj  node  with  no  outgoing  edge  represents  a  null 
value  of  a  variant  type. 

Let  di, . . .  ,d„  £  Dobj 00 .  We  use  the  following  notations: 

[fi  =  di . In  =  dn]  for  Record{li  =du...,l„=  d„), 

fldi, . . .  ,dnJ  for  Set{elmi  =  dlt. ,  elnin  =  dn). 

The  set  of  finite  descriptions  Dobj  coincides  with  the  following  inductively  defined  set  Dobj°: 

1.  c  £  Dobj0,  for  any  c  £  Sj,  b  £  B, 

2.  nulls  £  Dobj0  for  any  b  £  B, 

3.  if  di, . .  ,dn  £  Dobj 0  and  l\,  ■ .  ■  Jn  £  C  then  [/t  =  di  dn]  £  Dobj0, 

4.  Inj  £  Dobj° , 

5.  if  d  £  Dobj0  and  l  £  C  then  Inj{l  =  d)  £  Dobj° , 

6.  if  di, . . .  ,d„  £  Dobj0  then  {di, . . . ,  d„  }  £  Dobj°,  0  <  n. 

Figure  4  shows  examples  of  descriptions. 


ID 


Unity 

Point2S 

Onelist 

Null-person 

Null-employee 

John 

Maryl 
Mary2  j 
MaryS 

Flights 

Flown-hy 


Schedule-data 


=  (] 

=  [X-cord  =  2,  Y-cord  =  3] 

=  Inj(Cons  =  [Head  —  1,  Tail  =  Inj(Nil  =  Unity)]) 

=  (rec  p.  [Mime  =  null,tring ,  Age  =  nu//,nt,  Parents  =  {p}]) 

=  ( rec  e.  [Name  =  nuU„ring,Age  =  nultint,  Parents  =  {Null-person}, 
Salary  =  nullint,  Boss  =  e]) 

—  [Name  —  "John  Smith" ,  Age  —  34,  Parent  —  {Nti/l-person}, 

Salary  =  23000,  Boss  =  Null-employee] 

=  [Name  =  "Mary  Blake",  Age  =  21,  Parent  =  f  Nu//-person}, 

Courses  =  {"matAl,20", " philSJO ",  "logicllO  "J] 

=  [Name  =  "Mary  Blake",  Age  =  21,Faren<  =  f  Null-person}, 

Salary  =  9000,  Foss  =  ,/oAn] 

=  [Name  =  "Mary  Blake",  Age  =  21,  Parent  =  f  Null-person}, 

Courses  =  f  "ma<Al,?0",  "philS^O",  "logicllO"}, 

Salary  =  9000,  Foss  =  JoAn] 

=  f  [Flight  =  [F-»<f  =  001,  Date  =  "0  Auy"],  F/ane  =  "  Concord"], 
[Flight  =  [F-uf  =  83,  Z>a<e  =  “9  Aug"],  Plane  =  "  707"], 

[Flight  =  [F-id  =  116,  Date  =  "10  /iuj"],  Plane  =  "7*7"]} 


=  {  [Plane  =  "Concord",  Pilots  =  f  [Mjme="./0nes",  Fmp-td  =  5566]  J], 
[F/ane  =  "707",  Fi/ats  =  {  [ Name  =  "C/ar",  Fmp-uf  =  1122], 

[Name  =  "Copely",  Emp-id  =  2233], 
[Name  =  "Chin",  Emp-id  =  3344] J], 
[Plane  =  "  7*7",  Ftfots  =  {  [Name  =  "Clark",  Emp-id  =  1122], 

[Name  =  “Jones",  Emp-id  =  5566] J] J 

=  {  [  Plane  =  "Concord",  Pilots  =  f  [Name  =  "Jones",  Emp-id  =  5566]}, 
Flight  =  [F-id  =  001,  Date  =  "8  /In?"]], 

[  Plane  =  "707",  Fi/ois  =  {[Name  =  » Clark ",  Emp-id  =  1122], 

[Name  =  "Copely",  Emp-id  =  2233], 
[Name  =  ” Chin",  Emp-id  =  3344]}, 
Flight  =  [F-id  =  83,  Date  =  "9  ,4u0"]], 

[  Plane  =  "7J7",  Pilots  =  f  [Name  =  " Clark" ,  Emp-id  =  1122], 

[Name  =  "Jones",  Emp-id  =  5566]}, 
Flight  =  [F-id  =  116,  Date  =  "10  ,4u?"]]} 


Figure  4:  Examples  of  Descriptions 
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4.4  Typing  Relation 

Description  types  represent  structures  of  descriptions.  A.  description  d  has  &  description  type  <r  if  d  has  the 
structure  represented  by  a.  This  relationship  is  formalized  by  the  typing  relation: 

Definition  18  (Typing  Relation)  Let  ss  be  the  equivalence  relation  on  C  defined  as  li  ~  1 2  iffh  =  h  or 
li  =  e/m,-, /j  =  elmj  for  some  i,j.  Define  the  consistency  relation  :*  between  Fj  and  Fr  as  follows:  f  :4  g 
iff  one  of  the  following  holds: 

1-  f  —  9< 

2.  f  —  Inj  and  g  =  Variant, 

3.  f  £  Bg  and  g  £  B, 

4-  f  =  nullg  and  g  £  B. 

The  typing  relation  d  :  a  between  Dobj°°  and  Dtype°°  is  defined  as:  d  :  <r  iff  for  all  a  £  dom(d), 

1.  there  is  some  a'  such  that  ass  a',  d(a)  :b  tr(a'), 

2.  if  d(a)  =  Record  then  {/  £  C\a  l  £  dom(d)}  =  {/  £  C\a'  ■  l  £  dom(<r)}, 

3.  ifd(a)  =  Inj  then  if  a  ■  l  £  dom(d)  for  some  I  £  C  then  l  £  {I  £  C\a'  ■  l  £  dom((r)}, 

The  equivalence  relation  ss  “ignores”  the  difference  due  to  the  positions  elm ! , . . . ,  elm„  of  occurrences  of 
subtrees  in  the  set  constructor  Set(elmi  =  di, . . . ,  elnin  =  dn). 

When  restricted  to  the  set  of  finite  descriptions  Dobj ,  the  above  typing  relation  coincides  with  the 
following  relation  on  Dobj  x  Dtype°°  defined  by  induction  on  Dobj: 

1 .  c  0  b  for  all  c  6  , 

2.  nullb  6, 

3.  if  dx  :°  <ti,  ...  ,dn  <rn  and  lx,  ...,/„€  C  then  [h  -  db,. . .  ,ln  s  dn]  :°  [/a  :  <Ti . /„  :  <rn], 

4.  Inj  {/,  :<TU...,ln  :<rn), 

5.  if  d  :°  <r  then  Inj(l  —  d)  <7, . . .), 

6.  if  di  :°  <j .... ,  dn  :°  o  then  ldu...,dn}  {(7J. 

Note  however  that  d  €  Dobj  and  d  <7  does  not  implies  that  <r  £  Dtype  because  of  variant  types,  i.e.  the 
rule  4  in  the  above  definition. 

Figure  5  shows  examples  of  typing  relations  that  hold  between  descriptions  defined  in  figure  4  and 
description  types  defined  in  figure  2. 
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Unity 

unit 

PointSS 

point 

0n<  list 

intlist 

Null-person 

person 

Null-employee 

employee 

John 

employee 

Maryl 

student 

Mary  2 

employee 

MaryS 

working-student 

Flights 

flights 

Flown-by 

flown-by 

Schedule-data 

schedule-data 

Figure  5:  Examples  of  Typing  Relation 

From  the  above  inductive  characterization  of  typing  relation,  it  is  easy  to  check  that  for  any  finite 
description  d  and  any  description  type  a  it  is  decidable  whether  d  :  er  or  not.  This  property  is  essential  to 
develop  a  type  inference  system.  Fortunately,  this  property  still  holds  for  general  descriptions: 

Proposition  7  For  any  d  £  Dobj^  ,<r  £  Dtype°° ,  the  property  d  :  a  is  decidable. 

Proof  Let  Mi  =  {Qi,  sj,  Fa,  Si,  Xi)  and  M„  =  {Qg,s9,Fr,S9,  X9)  be  Moore  machines  representing  d  and 
a  respectively.  Let  M  =  (Q,s,F,6,  A)  be  the  product  machine  {Mi  x  M„)/»  where  as  is  the  equivalence 
relation  on  £  defined  in  definition  18.  We  show  that  d  :  a  iff  M  satisfies  the  following  conditions:  for  any 
reachable  state  q, 

1.  if  q  =  {qi,x),qi  £  Qi  then  x  £  Q 2  and  \{q)  =  (/,  g)  such  that  /  ■i  g , 

2.  if  q  =  (91,92),  ?i  £  Qi,92  €  Qj,  ^(9)  =  {Record,  Record)  and  <5(9,/)  =  9'  then  /  =  (/',/')./'  #  $, 

3.  if  9  =  (9i,92),9i  €  Qi,92  €  Q2,  Mq)  =  {Inj,  Warrant)  and  6(9,/)  is  defined  then  /  =  (/',/')  or  /  =  ($,/') 
for  some  /'  ^  $. 

By  lemma  3,  M  satisfies  the  condition  1  iff  for  any  a  £  dom(Mi{si)),  there  is  some  a'  such  that  a  as  a! . 
b\(s\,a)  =  91,  6?(s2,a')  =  qi,  and  Xi(q\)  :b  ^2(92)-  Since  Mi,Ma  represent  d.i 7  respectively,  this  condition 
is  equivalent  to  the  condition  1  of  the  definition  of  the  typing  relation.  The  equivalences  of  the  conditions  2,  3 
of  the  propositions  and  the  conditions  2,  3  of  the  definition  of  the  typing  relation  are  immediate  consequences 
of  their  definitions. 

Since  M  is  effectively  constructed  and  the  above  property  is  clearly  decidable,  the  proposition  is  proved. 

I 
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4.5  Description  Domains 

By  the  typing  relation,  we  can  identify  for  each  description  type  the  corresponding  set  of  descriptions.  By 
defining  a  proper  ordering,  we  turn  this  set  into  a  description  domain.  For  a  pair  of  trees  di,dz,  Courcelle 
described  [19]  the  notion  of  a  coherent  and  simplifiable  relation  on  Subtrees(di)  x  Subtrees (d2)  as  a 
relation  ~  satisfying  the  condition  that  if 

f(h=d1,...,ln=dn)~g(l1  =<,...,/„=<) 

then  di  ~  0  <  i  <  n  and  /  =  g.  By  generalizing  this  and  combining  it  with  Smyth  powerdomain  preorder, 

we  can  generalize  the  information  ordering  on  flat  descriptions  to  Dobj°°: 

Definition  19  (Information  Preorder  on  Dobj°°)  The  information  ordering  on  the  set  Ft  of  descrip¬ 
tion  constructors  is  the  following  partial  ordering  C4: 

f  g  iff  f  =  g  or  f  =  nullh  and  g  £ 

The  information  preorder  X  on  Dobj°°  is  the  relation  defined  as:  di  ^  d2  iff  there  is  a  relation  called 
substructure  relation,  on  Subtrees(di)  x  Subtrees{df)  satisfying  the  following  properties: 


1.  d\  ~  d2 1 

2.  if  d  ~  d!  then  d(e)  C4  d'( e), 

3.  i fd~  d!  and  d(c)  =  Record  then  {/  £  C\l  £  </om(d)}  =  {/  £  C\Cl  £  dom(d')}  and  for  all  l  £  {/  6  C\l  £ 
dom(d)}  d/l  ~  d'/l, 

4-  if  d  ~  d! ,  d(e)  =  Variant  and  l  £  dom(d)  then  l  £  dom(d')  and  d/l  ~  d'/l, 

5.  if  d  ~  d' ,  d(e)  =  Set  then  for  all  /£{/££]/£  dom(d')}  there  is  some  /'£{/£  C\l  £  dom(d)}  such 
that  d/l'  ~  d'/l. 

This  ordering  is  also  closely  related  to  Smyth  simulation  on  a  certain  class  of  directed  graphs  defined  in  [49]. 
The  relation  ■<,  when  restricted  to  the  set  of  finite  descriptions  Dobj,  coincides  with  the  following 


inductively  defined  relation  -<0-. 

c 

c  for  all  c  £  Bh, 

nulli, 

c  for  all  c  £  Bh, 

nullh 

nullh, 

[/i  =  dj, . .  ,  /„  =  dn\ 

-±° 

[/,  ifd,  d'„  1  <  i<n. 

In] 

Inj(l  =  d)  for  all  d , 

Infil  =  d) 

<° 

Inj(l  =  d')  if  d<°  d'. 

Ki .  •  •  ■  ■  J  if  V<f  £  {d\ . <C}  3d£{di 

On  a  substructure  relation  the  following  property  hold: 
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Lemma  4  Let  dy  ■<  d2  and  ~  be  a  substructure  relation  on  Subtreea(dy)  x  Subtrees(d2).  For  d\  € 
Subtrees(di), dj  €  Subtraes(d2),  if  dfy  ~  <f2  then  d\  ^  <f2. 

Proof  Immediate  consequence  of  the  fact  that  the  restriction  of  a  substructure  relation  ~  to  Subtrees(d\ )  x 
Subtrees(d 2)  is  also  a  substructure  relation.  I 

We  next  show  that  ■<  is  a  preorder  having  the  desired  properties.  Rounds’  recent  work  [49]  also  shows  a 
similar  results  in  a  slightly  different  framework. 

Proposition  8  The  relation  <  is  a  preorder  on  Dobj°°  with  the  pairwise  bounded  join  property. 

The  strategy  of  the  following  rather  long  proof  is  the  combination  of  the  technique  suggested  in  [3]  to 
construct  a  least  upper  bound  of  two  regular  trees  by  tracing  the  moves  of  two  Moore  machines  representing 
them  in  parallel  and  the  property  of  Smyth  powerdomain  preorder  shown  in  [56]  that  if  «i  and  s2  are  finite 
subset  of  a  a  poset  then  U  d2\dy  6  sy,d2  €  s2  and  dy  U  d2  exists}  is  a  least  upper  bound  of  Si  and  s2 
under  the  Smyth  preorder. 

Proof  For  any  description  d ,  the  identity  relation  on  Subtrees(d)  is  a  substructure  relation  and  d  ■<  d. 
Suppose  dy  X  d2  and  d2  ■<  d3.  Let  ~i  and  ~2  be  substructure  relations  on  Subtrees(dy)  x  Subtrees(d2) 
and  Subtrees(d2)  x  Subtrees(d3)  respectively.  Then  the  composition  of  the  two  relations  rj,r2  also 
satisfies  the  conditions  of  substructure  relation.  Therefore  dy  <  d3  and  <  is  a  preorder. 

We  next  show  that  <  has  the  pairwise  bounded  join  property  by  showing  the  following  stronger  property: 
There  is  an  algorithm  taking  any  two  descriptions  dj ,  d2  that  determines  whether  dy ,  d2  have  an  upper 
bound  or  not  and  that  if  dy,d2  have  an  upper  bound  then  computes  (one  of)  their  least  upper  bound.  Let 
Mdl  =  (Qi,sy,Fd,6y,  Ai)  and  =  (Q2,  s2,  Fj,62,  be  Moore  machine  representing  dy,d2  respectively. 
Let  M  be  the  product  machine  (My  x  M2)/ss.  We  say  that  a  state  q  in  M  is  consistent  iff  it  satisfies  the 
condition  that  if  q  =  (qy,q2)  for  some  qy  €  Q\,q2  £  Q2  then  A (q)  =  (/, g)  for  some  f,g  €  Fi  such  that  /, g 
has  an  upper  bound  and  the  following  conditions  are  satisfied: 

1.  if  A (q)  =  ( Record ,  Record)  then  for  all  l  if  S(q,l)  is  defined  and  equal  q'  then  l  =  (/',/')  for  some  /'  and 
q'  is  consistent, 

2.  if  Afa)  =  (Inj,Inf)  then  there  is  at  most  one  /  such  that  6(q,l)  =  q'  and  if  6(q,  (l\  /'))  =  q'  for  some  /' 
then  q'  is  consistent. 

We  first  show  that  if  dy,d2  has  an  upper  bound  then  s  is  consistent.  Suppose  s  is  not  consistent.  Then  there 
is  some  a  €  C‘  such  that  (1)  S‘(s,a)  =  (qy,q2),qy  €  Qy,q2  €  Qi  and  (2)  for  any  prefix  b  of  a  X(s.b)  is  either 
( Record,  Record )  or  (Inj,  Inj)  and  (3)  one  of  the  following  hold:  (a)  \((qy,q2))  =  ( f,g )  such  that  {f,g)  has  no 
upper  bound,  (b)  X((q\ ,  q2))  =  ( Record,  Record)  and  there  is  some  (ly,l2),ly  ^  l2  such  that  S((qy,q2),(ty,l2)) 
is  defined,  (c)  \((qy ,  q2))  =  ( In),  Inj)  and  there  are  at  least  two  distinct  l\,l2  such  that  both  S((qy ,  q2),  ly )  and 
&((qi,q2)i  h)  are  defined.  Now  suppose  to  the  contrary  that  there  is  some  d  such  that  dy  ■<  d  and  d2  X  d. 
Let  a  be  a  string  satisfying  the  condition  (1)  and  (2).  Then  by  lemma  4,  dy/a  ■<  d/a  and  d2/a  ■<  d/a,  which 
contradrts  the  condition  (3). 
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Next  we  show  that  if  a  is  consistent  then  di,dj  has  a  least  upper  bound  by  constructing  one.  Suppose  a 
is  consistent.  Define  M'  =  ( Q ,  a,  Fa,  6',  A')  from  M  as  follows: 

1.  Q,8  are  same  as  M, 

2.  S'(q,l)  is  defined  and  equal  to  q'  iff  one  of  the  following  hold: 

(a)  A (q)  =  (Record,  Recorf)  and  6(q,(!,l))  =  q', 

(b)  A(g)  =  (Set,  Set),  l  =  elnii  and  S(q,  (elmj,  elm/,))  =  q'  where  ( elmj ,  elmt)  is  the  itk  smallest  symbol 
under  the  total  order  on  C  in  the  set  {(e/mj,  elmn)\6(q,  (e/m/,  e/m„))  is  defined  and  consistent}, 

(c)  A (q)  =  ( Inj,Inj )  and  one  of  the  following  hold:  (i)  6(9,  (/,/))  =  q',  (ii)  S(q,  (/,$))  =  q'  or  (iii) 

%,0M))  =  g'. 

(d)  q  =  (?i,$)  and  /  =  (/,$)  or  q  =  ($,92)  and  /  =  (9,1). 

xUy  if  A(g)  =  (x,y),x,y  €  Fj  and  x  U  y  exists 
x  if  A (q)  =  (x,$) 

y  if  A(g)  =  ($,  y) 

$  otherwise 

We  show  that  M'(s)  is  a  least  upper  bound  of  di,d2-  Let  Si  =  {A/i(g)|g  €  Qi,q  reachable},  S2  =  {M2(g)|9  £ 
Q2,q  reachable},  and  S  =  (A/'(g)|g  £  Q,  q  reachable}.  Then  Si  =  Subtrees(di),S2  =  Subtreea(d7)  and 
S  =  Subtreea(M'(s)).  Define  the  relation  ~i  between  Si  and  S  as  Mi(q)  ~i  M'(q')  iff  q'  =  (q,x)  for 
some  x.  Then  it  is  easily  checked  that  this  relation  satisfies  the  conditions  of  substructure  relation  and 
therefore  d  1  d  M'(s).  Similarly  ^2  d  M'(s).  Let  d  be  any  upper  bound  of  di,d2.  Let  ~i,  ~2  be  substructure 
relations  on  Subtrees(d\)  x  Subtreea(d)  and  Subtrees(d2)  x  Subtrees(d)  respectively.  Define  the 
relation  ~  on  S  x  Subtrees(d)  as  M'(q)  ~  d'  iff  one  of  the  following  hold:  (1)  q  =  (gi ,  S),  M\(q\)  d' , 
(2)  q  =  ($,  92),  M2(92)  —2  d! ,  or  (3)  q  =  (91, 92). Mi(qi)  — 1  d',M2(92)  —2  d' .  Then  ~  clearly  satisfies  the 
conditions  1,2, 3, 4  of  the  definition  of  substructure  relation.  For  the  condition  5  of  substructure  relation, 
suppose  M'(q )  ~  d'  and  M'(q)  =  Set.  If  9  =  (91,8)  or  9  =  ($,92)  then  the  condition  5  follows  from  the 
fact  that  , ~'2  are  substructure  relations.  Suppose  9  =  (91,92).  Then  Afi(qi)  d'  and  A/2(92)  ~2  d' ■ 
If  l  £  dom(d')  for  some  /  £  £,  then  there  is  some  / 1,/2  €  C  such  that  61(91, /1)  =  91,62(92,12)  =  92, 
Mi(q[)  is'i  <? /I  and  11/2(92)  —2  dtfl.  By  lemma  4,  M\(q()  d  d'/l  and  M2(92)  d  d'/l.  Let  M[ ,  M2 ,  M" 
be  respectively  Moore  machines  obtained  from  Mi,  A/2,  A/'  by  respectively  replacing  their  start  states  with 
?i.92.W,92)-  Clearly  Afi(«i)  =  M[(q[),  M2(92)  =  M'(9')  and  M"  =  (M{  x  M^y«.  Since  M((q()  and 
M2(92)  has  an  upper  bound,  (9^,92)  is  consistent.  By  definition,  / 1  =  e/m,  and  /2  =  elmj  for  some  i,j.  Then 
by  definition  of  M'  there  is  some  /'  such  that  S'(q,l)  =  C9i ,  92)  and  therefore  M'(q)/1‘  ~  d'/l. 

Since  M'  is  effectively  constructed,  the  proposition  was  proved.  I 

The  above  proof  also  establishes  that  least  upper  bounds  of  d  are  effectively  computable.  For  the  Moore 
machine  M'  defined  in  the  above  proof,  the  following  property  can  be  also  easily  shown:  d\  d  d2  iff  A/' 
satisfies  the  condition  that  for  all  reachable  state  9  ii.  M'  if  9  is  consistent  then  it  is  of  the  form  9  =  (*,92) 
and  if  9  =  (91, 92),  91  G  Q:,92  G  Q2  then  Ai (91 )  C*  A2(q2 ) .  Therefore  we  have: 


Proposition  9  The  relation  x  on  Dobj°°  is  decidable  and  least  upper  bounds  (if  they  exist)  are  effectively 
computable. 

The  next  proposition  show  that  the  typing  is  preserved  by  least  upper  bound. 

Proposition  10  If  d\  :  <r,  d2  :  a  and  d  is  a  least  upper  bound  of  di,d2  then  d  :  cr. 

Proof  Let  d\ ,  d2  be  any  descriptions  and  M'  be  the  Moore  machine  representing  a  least  upper  bound  of  d\ 
and  d2  constructed  in  the  proof  of  proposition  8.  By  the  construction  of  Af',  for  any  a  £  dom(M'(s))  either 
there  is  some  b  6  dom{d\)  such  that  a  «  b  and  d\(b)  C  M'(s)( a)  or  there  is  some  c  £  dom(d2)  such  that 
a  as  c  and  cf2(c)  C  M'(s)(a).  Since  for  some  x,  y  €  Fd,  if  x  C  y  and  x  ■}  f  for  some  /  €  Fr  then  y  :*  /,  in 
either  case  a  satisfies  the  conditions  of  the  definition  of  the  typing  relation  d  :  <r.  I 

Definition  20  For  any  description  type  cr  £  Dtype°° ,  the  description  domain  Da  associated  with  <r  is  the 
poset  [({d|d  :  (r},X)]. 

Theorem  6  For  any  a,  D„  is  a  description  domain. 


Proof  We  show  that  D„  has  a  bottom  element.  By  definition  of  £)„,  it  is  suffices  to  show  the  existence  of 
a  description  d  such  that  d  -<d'  for  all  d!  £  {d|d  :  cr).  Define  a  mapping  nullval  :  Fr  — ►  F d  as 


nullval{f)  =  l 


nulli 

In} 

f 


if/eff 

if  /  =  Varinat 
otherwise 


For  any  <r ,  define  the  description  Null(o)  as  follows: 


1.  a  €  dom(Null(<r))  iff  a  €  dom{<r)  and  there  is  no  proper  prefix  6  of  a  such  that  <r(6)  =  Vannat,  and 

2.  for  all  a  £  dom(Null(<r)),  Null(<r)(a)  =  nullval{<r(a)). 

From  this  definition,  it  is  easy  to  check  that  Null(< r)  :  <r  and  Null(<r)  X  d  for  any  description  d  :  a.  Then 
the  theorem  follows  from  propositions  8,  9,  10  and  lemma  1.  I 

4.6  A  Model  of  the  Type  System 


We  now  define  the  set  of  embedding-projection  pairs  to  connect  the  set  of  description  domains  and  turn 
them  into  a  database  domain. 

For  defining  functions  and  properties  on  D„,  the  following  definitions  and  results  are  useful.  Let  (Pi,  <i), 
{Pi,  <i)  be  a  preordered  sets.  A  function  /  :  Pi  — •  P2  is  monotone  iff  for  any  pi,p2  £  Pi,  if  pi  <!  p2  then 
/(Pi)  <2  fipi)-  For  a  monotone  function  /  :  Pi  -*  P2,  define  [/]  :  P|/=  —  Pj/=  as  [/]([*])  =  [/(*)].  Since  / 
is  monotone,  [/]  is  well  defined.  It  is  also  clear  that  [/]  is  monotone.  The  following  lemma  is  an  immediate 
consequence  of  the  definition. 
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Lemma  5  Let  (Pi,<i),(Pi,  <2)  be  a  preordered  seta  and  f  ■  Pi  —*  P2,  g  .  P7—*  Pi  be  monotone  functions. 
If  for  all  p  £  Pi,  g(f(p))  =  P  and  for  all  p  £  P2,  f(g(p))  <2  P  then  ([/],  [5])  is  an  embedding-projection  pair 
between  [(Pi,<i)]  and  [(ft,  <2)]. 


Definition  21  Let  £  Dtype°°  such  that  <T\  <  <tj.  w  a  function  from  Dai  to  defined  as 

follows:  a  £  dom(<j>ai-.0j(d))  iff  either  (1)  a  £  dom(d)  or  (2)  a  €  dom{tr 2)  satisfying  the  following  conditions: 

1.  ifb  is  the  longest  prefix  of  a  such  that  b  £  dom(d)  then  d(b)  =  Record,  and 

2.  a  has  no  proper  prefix  b  such  that  b  £  dom(d )  and  <T2(b)  =  Varinat, 


and  for  any  a  €  dom(d>aiT.  M), 

d><rt—<r3(d)(a) 

where  nullval  is  a  function  from  FT  to  Fi  defined  in  the  proof  of  theorem  6. 


■{ 


d(a)  if  a  £  dom(d) 

nullvad^offa))  otherwise 


is  a  mapping  from  D„3  to  Dai  defined  as  follows:  for  any  d  £  Dai,  <r,(d)  is  the  restriction 
of  d  such  that  a  £  (fom(V'<r3-.<M)(tf)  iff  a  £  dom(d)  and  there  is  some  b  £  dom(( 72)  such  that  am  b. 

Define 


Emb°° 

Emb 

Proj00 

Proj 


6  Eftype°°,<n  <  07) 
{^,-.<rjki.^2  €  Dtype,  (T\  <  o-2} 
{Vv,-*,ki ,*2  6  Dtype00,  <r2  <  cr,} 
{^„1_„J|<ri,(r2  6  Dtype,  o2  <  <Ti} 


For  Emb  and  Proj,  there  axe  inductive  definitions.  VVe  first  define  functors  (function  constructions)  for 
records,  variants  and  sets. 


1.  Records. 

Let  f\  :  <t{  —  (r\, . . .  ,fn  :  <r*  — ►  <j\  be  any  functions  and  cn+1, . . . ,  cn+m  be  any  constants,  [/j  = 
fu--,ln  =  fn,ln+ 1  =  cn+l , . . . , /n+m  =  cn+m]  is  the  function  on  records  of  type  :  <r\,. ..  :  0^] 
defined  as 

[^1  =  /l  1  •  •  •  1  In  —  fn,  ln+l  =  cn  + 1 1  •  ■  •  i  fn+m  =  On  +  m](Rl  =  d\ ,  ln  =  dn])  — 

[f  1  =  /i(di),...,/n  —  fn{drt) ,  lri  +  l  =  Cn+ 1 ,  .  .  .  ,  /n+m  —  Ofi-frn] 

and  [/i  =  fi,  ■■■  ,1k  =  fk,h+i  =  $],  k  <  n  is  the  function  on  records  of  type  [/ j  :  <r },...,  h  = 

o\ ,  lk+ 1  =  + 1 , .  ■ .  ,ln  =  <r„]  for  some  <Tk+i ,  •  •  ■ ,  crn  defined  as 

[h  =/i, •••,/*  =  /*./*+!  =  s . /»=S](Pi  =du...,  /„=<*,])  =  [/,  =  /,(<*,) . /*  =  /*((/*)) 

2.  Vartanfj. 

Let  fi  :  <r\  —xr\, ...  ,fn  :  —  <7*  be  any  functions.  (/i  =  f\ ,  ■  ■  - ,  /„  =  /n)  is  the  function  on  variants 
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of  typ <i  (/t  :  *{,  :  <?n)  defined  as 


{h  =  fu-Jn  =  fn)(Inj)  =  Inj 
(h  =  /l ,  •  •  ■ ,  In  =  =  d))  =  W  =  /.(d)).  1  <  i  <  " 


3.  Seta. 

Let  f  ■  <T\  —*  <Tj  be  any  function.  {/}  is  the  function  on  sets  of  type  f  <ri  J  defined  as 

f/NR . d„})  =  {f(dl),...,f(dn)} 

Then  Emb  coincides  with  the  following  inductively  defined  set  Emb0: 

1.  idt  G  Emb 0  for  any  b  G  B  where  idj  is  the  identity  function  on  Bt,, 

2-  if  •  •  •  >  6  Emb0  and  ^n+i, . . . ,  ^,,+m  G  Atype  then  ft  =  =  6,._„a,/n+1 

Null(arn+i), ln+m  =  Null(crn+m)  ]  G  Emb0  where  Nullfr)  is  the  value  defined  in  theorem  6, 

3.  if  G  Emb0  then  (/i  =  ,■••/»»  =  <£**— <rj)  €  Emb° , 

4.  if  G  Am60  then  G  Emb° . 

The  Proj  coincides  with  the  following  inductively  defined  set: 

1.  idi  G  Proj'  where  idt  is  the  identity  function  on 

2.  if  a, . . . ,  ^i_„a  G  Proj  then  [/1  =  tiv*-*?.  •  •  In 

3.  if  Vv.-^a, . . .  ,^<7._„a  G  Proj  then  ft  =  0„._„a,  •  in 

4.  if  ^,,,-.<73  G  Proj  then  J  G  Proj. 

Proposition  11  For  any  <ti,<T2  sucA  Mat  <7i  <  <Ji,  ,])  is  an  embedding-projection  pair 

between  D*,  and 

Proof  For  any  element  d  G  A*,,  let  =  0ffl_<„(d)  and  d"  =  ^„3-.<7l(d').  By  definition  of 
dom(d)  C  dom(d')  and  for  any  a  G  dom(d),d'(a)  =  d(a).  By  definition  of  a  G  dom(d")  iff  a  G 

dom(d')  and  there  is  some  6  such  that  a  ss  6,  6  G  domfti).  Also  for  any  a  G  dom(d"),  d"(a)  =  d'(a).  But  by 
definition  of  D<ri ,  a  G  dom(d)  iff  there  is  some  6  such  that  a  as  6,  6  G  domfti).  Therefore  d  =  d"  and  hence 

V’»3-.»,(^<7l-.»,)(d)  =  d. 

For  any  element  d  G  A?,,  let  d'  =  ^<,3_„1(d)  and  d"  =  <^ai_(73(d')-  Define  a  relation  ~  on  Subtrees{d")x 
Subtrees(d)  as  follows:  for  d\  G  Subtree»(d"),d2  G  Subtrees(d),  di  ~  dj  iff  either  there  is  some 
a  G  dom(d')  such  that  d\  =  d"/a  and  d^  =  d/a,  or  there  is  some  a,  6  such  that  a  g  dom(d'),  a  ss  b,  di  =  d" /6 
and  d2  =  d/a.  Since  f  G  dom(d'),  d"  ~  d.  Suppose  dt  =  d."  ja,di  =  d/a  for  some  a  6  dom(d').  By 
definition  of  and  ,  d"(a)  =  d'(a)  =  d(a).  Suppose  dt  =  d"/6,d2  =  d/a  for  some  a  £  dom(d'), 

a  w  6.  Then  by  definition  of  6  G  dom(<ri)  and  d"(6)  =  nullval(tT2(b)).  By  the  property  of  nvllval, 


—  tiv>— <rjji  in+1  /n+m  —  $]  G  Proj , 

=  Vv.-aa)  G  Proj, 


d"(b)  C*  d(a).  Therefore  in  both  case  di(e)  C*  dj(f).  The  other  conditions  of  substructure  relation  (condi¬ 
tion  3-5)  can  be  easily  checked  by  distinguishing  cases  whether  a  €  dom(d')  or  not  and  using  the  property 
of  the  typing  relation  in  the  latter  case. 

For  the  monotonicity  of  let  dt,d3  £  £>»,  and  <%  =  d'2  =  <A*,_.,7a(d2).  Suppose  there 

is  a  substructure  relation  ~  on  Subtrees(di)  x  Subtrees(d2).  Define  a  relation  on  Subtrees(d'1)  x 
Subtreea(<f2 )  as  follows:  da'  df  iff  either  (1)  there  are  a,  b  such  that  d\fa  a  d3/b  and  d  =  d\fa,  &  =  d2fa  or 
(2)  there  are  a,  6,  c  such  that  di/a  a  d3/b ,  a-c  &  dom(di),  b  e  £  dom(d3),  d  =  d\/a  c  and  d!  —  d3/b  c.  It  can 
then  checked  that  is  a  substructure  relation.  For  the  monotonicity  of  0„_(7l ,  let  di,d2  €  and  d[  = 
4><>3—ai(di),  df2  =  Vva— <r,(dj).  Suppose  there  is  a  substructure  relation  a  on  Subtreea(di)  x  Subirees(d2). 
Define  a  relation  on  Subtreea(d[)  x  Subtreea(<?2)  as:  do!  d  iff  there  are  a,  6  such  that  d\/a  a  d2/b 
and  d  =  dj/a.d'  =  d^/a  or  Then  it  is  easily  verify  that  a'  is  a  substructure  relation.  Then  the  proposition 
follows  from  lemma  5.  I 

From  the  inductive  characterization  of  Emb  and  Proj  it  is  easy  to  see  that  all  embeddings  and  projec¬ 
tions  between  finite  types  are  computable  functions.  This  necessary  property  still  hold  for  general  embeddings 
and  projections. 

Proposition  12  Elements  of  Emb°°  and  Proj00  are  all  computable  functions. 

Proof  We  first  show  for  the  embeddings  in  Emb00.  Let  <ti  <  <r2  and  d  :  <n.  Let  Md  =  (Qd,  Pd,  &d,  Aj) 
and  =  (Q„3,  FT,  <5„a,  A,,)  be  Moore  machines  representing  d,a3  respectively.  Let  M  =  (Q,s,F,6,X)  = 
(Md  x  M„2)/a  be  the  product  machines  modulo  the  equivalence  relation  %  defined  in  definition  18.  Define 
M'  =  ( Q,s,Fd,6',X ')  from  M  as  follows: 

1.  Q,s  are  same  as  M, 

2.  6'(q,l)  is  defined  and  equal  to  q'  iff  either  S(q,(l,l'))  =  q  and  /  ^  or  6(q,($,l))  =  q'  and  \(q)  ^ 
(x,  Variant )  for  some  x. 

3.  X'(q)  =  /  iff  either  A (q)  =  (/,  g)  for  some  g  or  A (q)  =  ($,0)  and  /  =  nullval(g). 

It  can  then  be  checked  that  M'(s)  =  (d). 

For  projections  in  Proj 00 ,  let  it 2  <  <t\  and  d  <T\.  Let  Md  =  (Qd,Sd,  Fd,6d,  Xd)  and  .W,,  = 
(Qo3,  Ft,  S„2I  A„)  be  Moore  machines  representing  d,<72  respectively.  Let  M  =  (Q,  s,  F,  <5,  A)  =  (Md  x 
M<,,yss.  Define  M'  —  (Q,  s,  Fd,6',  A')  from  M  as  follows: 

1.  Q,s  are  same  as  M, 

2.  6'(q,l)  is  defined  and  equal  to  q'  iff  6(q,(l,l'))  =  q  and  /'  ^  $. 

3.  A '(<?)  =  /  iff  A(g)  =  (/,  5)  for  some  g. 

Then  by  lemma  3,  M'(s)  =  V1*,— <ra(d).  I 
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We  now  have  the  following  theorem: 


Theorem  7  ({Dv\a  6  Dtype00},  {[^]|^  6  Emb00})  is  a  database  domain  and  a  model  of  (D type00 ,  <)• 

Proof  By  proposition  11,  for  all  ^  G  Emb00,  [<£]  is  an  embedding.  Since  Dtype00  is  a  poeet  with  pairwise 
bounded  join  property,  the  conditions  1  -  4  of  database  domain  are  satisfied  by  the  set  {[<£]|<^  £  Emb00}. 
The  condition  5  is  shown  by  proposition  12.  I 

This  theorem  says  that  we  have  successfully  completed  the  constructions  of  a  type  system  for  complex 
database  objects  and  its  semantic  domain.  The  type  system  allows  arbitrarily  complex  objects  constructed 
by  records,  variants,  finite  sets  and  recursion.  A  join  and  a  projection  are  available  as  computable  functions 
on  arbitrarily  complex  structures  as  given  by  the  equations  (1)  and  (2).  Moreover,  since  these  operations 
have  polymorphic  type  schemes  (3)  and  (4),  the  result  types  can  be  always  computed  from  the  types  of  their 
arguments  without  actually  computing  them.  The  following  are  examples  of  joins  of  descriptions  in  figure  4: 

join(Maryl,  Mary 2)  =  MaryS 

join(  Flights,  Flown-by)  =  Schedule-data 

The  types  of  the  above  two  joins  are  working-student  and  schedule-data  respectively,  which  are  computed 
from  the  types  of  their  arguments.  This  property  enables  us  to  develop  a  static  type  inference  system.  The 
another  important  implication  of  the  theorem  7  is  that  it  provides  an  elegant  semantic  formulation  of  the 
domain  of  complex  database  objects  endowed  with  the  join  and  the  projection. 


5  A  Polymorphic  Language  for  Databases 

We  now  show  that  the  entire  type  system  and  the  semantic  domain  we  have  just  constructed  can  be  integrated 
in  an  ML-like  programming  language.  Such  integration  yields  a  strongly  typed  polymorphic  programming 
language  suitable  for  databases.  An  experimental  programming  language,  Machiavelli  [45],  embodying  the 
integration  has  been  developed  at  University  of  Pennsylvania.  In  this  section,  we  show  how  the  integration 
is  done  by  defining  a  subset  of  Machiavelli.  Redders  are  refer  to  [45]  for  discussions  of  the  advantages  of  such 
a  polymorphic  database  language  and  many  examples  of  database  programming  in  the  language. 

5.1  Types  and  Expressions 

The  first  step  of  the  integration  is  to  define  the  set  of  types  and  the  set  of  expressions  of  the  language  in  such 
a  way  that  the  set  of  description  types  Dtype00  and  the  set  of  descriptions  Dobj°°  can  be  freely  mixed 
with  the  other  constructs  of  the  language.  This  is  done  by  simply  extending  term  languages  for  Dtype 00 
and  Dob j00  we  have  defined  to  include  function  type  constructors  and  function  expressions. 

The  set  of  types  Type  (ranged  over  by  <)  of  the  language  is  given  by  the  following  abstract  syntax: 
t  ::=  6  |  l  —  t  |  [l  :  t . /  :  t]  |  (/  :  t, ...,/:  <)|  ft}  |(rec  v.  t(v)) 
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where  b  stands  for  the  set  of  base  types  B,  t(v)  stands  for  type  expressions  possibly  containing  symbol  v. 
Regarding  as  a  shorthand  for  Fvn(Domain  —  ti,  Range  =  <j),  each  type  expression  t  denotes  the 

regular  tree  Af»(s)  €  R(Fr  U  {fan})  where  M,  is  the  Moore  machine  defined  in  section  4.1.  We  regard 
t  €  Type  as  the  corresponding  regular  tree  A/t(s).  The  set  of  types  that  do  not  contain  the  function  type 
constructor  — * is  exactly  the  set  of  description  types  Dtype°° . 

For  expressions  of  the  language,  in  addition  to  expressions  that  denote  descriptions,  we  introduce  the 
following  expression  constructors: 

(fn(x)  =>  e)  :  function  abstraction 

f(a)  :  function  application 

r.l  field  selection  form  a  record  r 
modify(r,t,e)  modification  of  /  field  in  r  with  e 
( case  v  of  1 i  of  x\  =>  ei of  xn  =>  en )  case  analysis  for  a  variant 

union(si,S2)  :  set  union 

prod(si,Si)  :  cartesian  product  of  two  sets 

map(f,  s)  map  a  function  /  to  a  set  s 

The  set  of  expressions  Expr  (ranged  over  by  e)  of  the  language  is  then  given  by  the  following  abstract 
syntax: 

e  ::=  c  |  x  \  e(e )  |  ( fn(x )  =>  e)  |  let  x  =  e  in  e  | 

[/  =  e,  ...,/  =  e]  |  e./  |  modify(e,l,e)  \ 

Inj(l  =  e )  |  (case  e  of  l  of  x  =>  e, . . . ,/  of  x  =>  e)  \ 
f  e, . . . ,  e  J  |  union(e,e)  \  prod(e,e)  \  map(e,e)  \ 
join(e,e)  \  project(e,<r)  |  (rec  x.  e ) 

where  c  stands  for  constants,  x  stands  for  variables,  let  x  =  e  in  e  stands  for  ML  /e<-expressions.  The  subset 
of  Expr  defined  by  by  the  following  abstract  syntax 

d  ::=  c  |  [/  =  e . /  =  e]  |  Inj(l  =  e)  \  fe, . .  .,ej  |  (rec  x.  d(x)) 

denote  regular  trees  and  corresponds  exactly  to  the  set  Do6j°°.  We  identify  an  expression  d  and  the 
corresponding  description  in  Dobj°°  if  d  is  in  the  subset  specified  by  the  above  grammar. 

5.2  Type  Inference 

Different  from  the  explicitly  typed  language,  the  expressions  we  have  defined  carry  no  explicit  type  informa¬ 
tion.  Types  of  expressions  are  completely  inferred  by  a  proof  system  called  a  type  inference  system.  In  [44], 
a  complete  type  inference  system  for  a  language  containing  database  objects  without  variants  and  recursive 
objects  is  defined.  By  using  the  typing  relation  defined  in  section  4.4,  the  type  inference  system  defined  in 
[44]  can  be  extended  to  the  entire  set  of  the  above  expressions. 
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In  [44],  it  is  also  shown  that  by  extending  Milner’s  method  [40]  for  ML  type  inference  with  conditions  on 
substitutions,  we  can  define  a  complete  type  inference  algorithm.  The  method  relays  on  the  solvability  of 
unification  of  type  expressions,  the  decidability  of  the  ordering  on  description  types,  and  the  computability 
of  least  upper  bounds  of  description  types.  Huet  showed  [28]  the  solvability  of  unification  problem  of  regular 
trees  and  define  a  unification  algorithm.  In  section  4.4,  we  have  shown  the  decidability  of  the  ordering 
relation  and  the  computability  of  least  upper  bounds  of  description  types.  Using  these  two  results,  the 
method  described  in  [44]  can  be  extended  to  the  entire  language. 

5.3  Equality  and  Reduction  Relations  on  Expressions 

On  expressions,  sets  of  rules  should  be  defined  to  represents  the  equality  and  the  reduction  relation  on 
expressions,  which  determine  the  dynamic  properties  of  the  language.  These  relations  are  defined  on  typed 
expressions  derived  in  the  type  inference  system.  The  equality  rule  (rule  scheme)  for  expressions  correspond¬ 
ing  to  descriptions  is  given  as: 

(description)  d\  :t  =  df.t  if  d\,  d?  €  Dobj°°  ,dy  ■<  d 2,  d 2  X  d\ 

Since  •<  represent  the  goodness  of  descriptions,  this  rule  correctly  captures  the  intended  equality  on  descrip¬ 
tions.  The  equality  rules  for  the  join  and  the  projection  are  defined  as: 

(join)  jom(di  :  ti,d2  :  <2)  :  t  =  4>tl—t(di)  Do,  if  <1,  <2  6  Dtype°° ,  <1  U  t2  =  t 

(project)  projected  :tut2)  :  t2  =  t/>(l_»,(d)  if  <1,  <2  G  Dtype°°  ,t2  <  fi 

Combining  them  with  the  rules  for  standard  equational  reasoning,  the  standard  rules  for  function  applica¬ 
tions  (the  rule  /?),  /et-expressions  and  primitive  operations  other  than  join  and  project,  we  have  a  complete 
equality  relation  for  the  language. 

In  order  to  define  a  reduction  relation,  we  define  the  notion  of  normal  form  on  descriptions.  For  d  E 
£)o6j°°,  d  is  in  description  normal  form  if  for  any  element  d'  E  Subtrees(d)  if  d'(e)  =  Set  then  there  is 
no  d\,d2  such  that  d\  =  d'/elmi,  d 2  =  d! / elmj  for  some  i,j  and  dx  <  d2.  It  can  be  shown  that  for  any 
d  E  Dobj°° ,  there  is  some  d!  such  that  d!  is  in  description  normal  form  and  d  =  d'  in  the  sense  of  the  above 
equality  relation.  Moreover,  such  d'  can  be  effectively  computed.  The  reduction  rule  for  descriptions  is  then 
given  as: 

(description)  dx  :  t  — *  d2  :  t  if  dx,d2  E  Dobj°°  and  d2  is  in  a  description  normal  form 

The  reduction  rules  for  the  join  and  the  projection  are  same  as  the  rules  for  equality.  Combining  with 
the  rules  for  standard  equational  reasoning  except  the  rule  for  symmetry,  the  standard  rules  for  function 
applications  (the  rule  /?),  let-expressions  and  primitive  operations  other  than  join  and  project ,  we  have  a 
complete  reduction  relation  for  the  language.  Based  on  this  reduction  relation,  an  operational  semantics  of 
the  language  is  defined.  Actual  evaluation  algorithm  for  the  language  can  be  defined  by  using  the  algorithms 
for  computing  least  upper  bounds  of  descriptions,  embeddings  and  projections  that  have  been  defined  in  the 
proofs  of  their  computabilities  in  the  previous  section. 


32 


5.4  Semantics  of  the  Entire  Programming  Language 


In  practice,  it  is  sufficient  to  have  a  type  inference  algorithm  to  type-check  programs  and  an  evaluation 
algorithm  to  compute  results  of  programs.  For  a  better  understanding  of  the  language,  however,  it  is  highly 
desirable  to  construct  a  complete  semantics  of  the  entire  programming  language.  Such  a  semantics  should 
be  useful  for  reasoning  about  various  properties  of  programs  and  for  further  enhancement  of  the  language. 

In  addition  to  the  semantics  of  database  type  system  we  have  constructed,  a  semantics  of  the  entire 
language  requires  a  semantics  of  ML  polymorphism.  Milner  proposed  one  such  semantics  [40]  using  a 
universal  value  domain  of  an  untyped  language.  In  his  semantics,  a  type  denotes  a  subset  of  the  universal 
domain.  MacQueen  et.  al.  extended  this  semantics  to  recursive  types  [36].  However,  this  semantics  does  not 
agree  with  behavior  of  implicitly  typed  expressions.  (See  a  [43]  for  an  analysis  of  this  problem.)  Recently, 
Mitchell  and  Harper  showed  that  [41]  there  is  a  one-to-one  correspondence  between  a  typing  derivation  in 
ML  type  inference  system  and  a  term  in  a  explicitly  typed  language.  Along  the  line  of  this  connection, 
it  is  shown’  in  [43]  that  a  semantics  of  explicitly  typed  language  yields  a  semantics  of  the  corresponding 
implicitly  typed  language  supporting  ML  polymorphism.  It  is  therefore  sufficient  to  construct  a  semantics 
of  the  explicitly  typed  version  of  the  language. 

We  regard  expression  constructors  other  than  function  abstraction  and  function  application  as  (“curried”) 
constant  functions.  For  example,  the  record  [Mime  =  “Joe"}  is  regarded  as  the  application  [Name  =  -]("/oe") 
of  the  constant  function  [Name  =  .]  to  the  constant  "Joe"  and  a  join  join(ditd2)  is  regarded  as  the  curried 
application  >oin(d1)(d2)  of  the  constant  function  join  to  di,d2.  Recursive  descriptions  can  be  also  treated 
in  this  way.  The  explicitly  typed  language  corresponding  to  our  language  is  then  obtained  by  explicitly 
specifying  the  type  of  parameter  in  function  abstraction  as  in  ( fn(x  :  t)  =>  e)  and  replace  each  constant  by 
the  corresponding  set  of  typed  constants.  The  resulting  language  is  a  typed  lambda  calculus  with  constants. 
In  [10],  a  framework  for  a  semantics  of  typed  lambda  calculi  was  given.  In  the  framework,  the  set  of  types 
is  generalized  to  a  type  algebra  allowing  arbitrary  equations  (or  constraints).  Since  the  set  of  regular  trees 
satisfies  their  definition  of  a  type  algebra,  we  can  use  this  framework  to  construct  a  semantics  of  the  explicitly 
typed  language.  In  the  framework,  a  semantic  space  of  a  language  is  a  frame  (T,  •  ,  7)  where:  T  is  a  Type- 
indexed  set  such  that  each  Ft  €  T  is  non-empty,  •  is  a  binary  operation  •  :  — ►  F(,  representing  the 

function  application  and  7  is  a  function  that  interprets  constants.  For  our  language,  we  impose  the  following 
conditions  on  a  frame  (F,»,7): 

1.  for  each  t  6  Dtype°° ,  ( Dt  U  {T})  C  Ft,  where  Dt  is  the  description  domain  we  have  constructed  and 
T  is  a  distinguished  value  representing  the  exception  of  the  computation  of  join, 

2.  for  c  e  Bb,  7(0  :  6)  =  c, 

3.  for  nul/j,  -[(nullf,  :  b)  =  nu//j, 

4.  for  a  constant  /  :  t  introduced  for  a  term  constructor,  y(/  :  t)  is  the  element  in  Ft  satisfying  the 
intended  equations.  Such  equations  are  easily  defined  for  each  constant  based  on  the  structures  of 
database  domain  we  have  developed  in  the  previous  section.  For  example,  the  necessary  equations  for 
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7 (join  :  <1 


h 


t)  €  Ft,  where  t  =  *i  LI  *2>  are  given  as: 

0  •  (dO)  -  (<*,)  =  {  (d>)  u“- 


«(^a) 


for  all  di  6  Dt,,dj  g  £)*,. 


if  lub  exists 
otherwise 


The  method  described  in  [10]  can  then  be  applied  to  construct  a  semantics  of  the  explicitly  typed  version  of 
our  language.  A  semantics  of  the  implicitly  typed  language  supporting  ML  polymorphism  can  be  constructed 
by  using  the  method  described  in  [43].  Based  on  the  semantics,  we  can  show  the  strong  soundness  and 
completeness  theorem  for  the  equational  theory  of  our  language  as  is  done  in  [43]. 


6  Conclusion  and  Future  Works 

Based  on  an  abstract  analysis  of  the  relational  data  mode,  we  have  proposed  a  framework  for  semantics  of 
types  for  databases.  We  characterized  a  semantic  space  of  individual  type  as  a  poset  of  descriptions,  which 
we  called  a  description  domain,  and  a  semantic  space  of  the  entire  type  system  as  a  poset  of  description 
domains,  which  we  called  a  database  domain.  Based  on  this  framework,  we  have  constructed  a  concrete 
database  type  system  and  its  semantic  domain  using  regular  trees  supporting  arbitrary  complex  structure 
constructed  from  records,  variants,  finite  sets  and  recursive  definitions.  On  these  complex  structures,  a 
join  and  a  projection  are  available  as  typed  polymorphic  operations.  We  have  also  shown  that  both  the 
type  system  and  the  semantic  domain  can  be  uniformly  integrated  in  an  ML-like  polymorphic  programming 
language. 

In  our  study  of  database  type  system,  we  have  implicitly  assumed  that  database  objects  are  values.  Tow 
objects  are  equal  if  they  are  equal  as  values.  As  we  have  demonstrated,  these  value-based  database  systems 
are  fit  nicely  to  a  paradigm  of  functional  programming  languages.  However,  value-based  systems  have  a 
disadvantage  that  it  is  rather  difficult  to  represent  sharing  and  mutability,  which  are  also  important  aspects 
of  database  objects.  In  order  to  overcome  this  disadvantage,  the  notion  of  “object-identities”  has  been 
proposed  [7,  37,  33].  In  an  identity-based  system,  database  objects  are  represented  by  their  unique  identities 
associated  with  their  attribute  values.  For  the  same  reason  as  we  wanted  to  integrate  value-based  database 
system  into  a  modern  type  system  of  a  programming  language,  we  would  like  to  integrate  identity-based 
database  objects  in  a  types  system  of  a  programming  language.  Although  the  notion  of  object  identities 
is  intuitively  clear  and  appealing,  integrating  it  into  a  programming  language  type  system  constitutes  a 
challenge.  As  demonstrated  in  [45],  the  major  properties  of  object  identities  seems  to  be  captured  by  ML 
reference  type  when  integrated  in  a  database  type  system  like  the  one  we  have  developed  in  this  paper. 
However,  a  uniform  and  elegant  integration  will  require  an  analysis  of  the  properties  of  object  identities 
analogous  to  what  we  have  done  to  the  structure  of  value-based  complex  database  objects. 
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